cassandra를 운영하면서 CMS Full GC가 발생하면서 수십 초동안 stop the world (hang)이 발생했다. 이를 간단히 해결한 내용을 작성한다.
Case #1
node repair를 하고 난 직후 바로 다른 table(colum family)에서 compaction이 일어났고 CMS Full GC가 발생했다.
수십 초동안 stop the world (hang)이 발생했다.
<system.log>
INFO [MemtableFlushWriter:30358] 2017-02-10 19:18:38,616 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/googleplus/activity_counts-2b2d7da08ec211e6aa074b983626f83b/googleplus-activity_counts-tmp-ka-199677-Data.db (5.208MiB) for commitlog position ReplayPosition(segmentId=1483928416854, position=30283693)
INFO [Service Thread] 2017-02-10 19:23:54,659 GCInspector.java:252 - ParNew GC in 371ms. CMS Old Gen: 3756257272 -> 4008698560; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 99052984 -> 214695936
INFO [Service Thread] 2017-02-10 19:24:03,365 GCInspector.java:252 - ParNew GC in 206ms. CMS Old Gen: 5184283352 -> 5516390704; Par Eden Space: 1718091776 -> 0;
INFO [Service Thread] 2017-02-10 19:24:03,767 GCInspector.java:252 - ParNew GC in 223ms. CMS Old Gen: 5516390704 -> 5851059920; Par Eden Space: 1718091776 -> 0;
INFO [Service Thread] 2017-02-10 19:24:33,346 GCInspector.java:252 - ParNew GC in 21623ms. CMS Old Gen: 5851059920 -> 6084658992;
INFO [Service Thread] 2017-02-10 19:24:33,346 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [Service Thread] 2017-02-10 19:24:33,385 StatusLogger.java:66 - MutationStage 96 837 2082221481 0 0
INFO [Service Thread] 2017-02-10 19:24:33,388 StatusLogger.java:66 - RequestResponseStage 0 0 3415222266 0 0
INFO [Service Thread] 2017-02-10 19:24:33,388 StatusLogger.java:66 - ReadRepairStage 0 0 129691312 0 0
INFO [Service Thread] 2017-02-10 19:24:33,388 StatusLogger.java:66 - CounterMutationStage 14 0 61697119 0 0
INFO [Service Thread] 2017-02-10 19:24:33,388 StatusLogger.java:66 - ReadStage 40 23 1525495767 0 0
INFO [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - MiscStage 0 0 0 0 0
INFO [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - AntiEntropySessions 0 0 30954 0 0
INFO [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - HintedHandoff 0 1 4704 0 0
INFO [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - GossipStage 0 0 9002749 0 0
INFO [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - CacheCleanupExecutor 0 0 0 0 0
INFO [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - InternalResponseStage 0 0 27677 0 0
INFO [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - CommitLogArchiver 0 0 0 0 0
INFO [Service Thread] 2017-02-10 19:24:33,390 StatusLogger.java:66 - CompactionExecutor 2 65 3178052 0 0
INFO [Service Thread] 2017-02-10 19:24:33,390 StatusLogger.java:66 - ValidationExecutor 0 0 772255 0 0
INFO [Service Thread] 2017-02-10 19:24:33,390 StatusLogger.java:66 - MigrationStage 0 0 7 0 0
INFO [Service Thread] 2017-02-10 19:24:33,390 StatusLogger.java:66 - AntiEntropyStage 0 0 6186269 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,357 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,396 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,396 StatusLogger.java:66 - MutationStage 96 2022 2082221481 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,396 StatusLogger.java:66 - RequestResponseStage 0 0 3415222266 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,396 StatusLogger.java:66 - ReadRepairStage 0 0 129691312 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - CounterMutationStage 14 0 61697119 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - ReadStage 43 329 1525495974 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - MiscStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - AntiEntropySessions 0 0 30954 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - HintedHandoff 0 1 4704 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - GossipStage 0 0 9002749 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - CacheCleanupExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - InternalResponseStage 0 0 27677 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - CommitLogArchiver 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - CompactionExecutor 2 65 3178052 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - ValidationExecutor 0 0 772255 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - MigrationStage 0 0 7 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - AntiEntropyStage 0 0 6186269 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - PendingRangeCalculator 0 0 11 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - Sampler 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - MemtableFlushWriter 0 0 284744 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - MemtablePostFlush 1 1 969941 0 0
INFO [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - MemtableReclaimMemory 0 0 361430 0 0
WARN [GossipTasks:1] 2017-02-10 19:24:33,423 FailureDetector.java:249 - Not marking nodes down due to local pause of 30438450773 > 5000000000
INFO [GossipTasks:1] 2017-02-10 19:24:33,423 Gossiper.java:968 - InetAddress /1.1.1.1 is now DOWN
INFO [GossipTasks:1] 2017-02-10 19:24:33,426 Gossiper.java:968 - InetAddress /1.1.1.2 is now DOWN
INFO [GossipTasks:1] 2017-02-10 19:24:33,427 Gossiper.java:968 - InetAddress /1.1.1.3 is now DOWN
INFO [GossipTasks:1] 2017-02-10 19:24:33,427 Gossiper.java:968 - InetAddress /1.1.1.4 is now DOWN
INFO [GossipTasks:1] 2017-02-10 19:24:33,427 Gossiper.java:968 - InetAddress /1.1.1.5 is now DOWN
INFO [GossipTasks:1] 2017-02-10 19:24:33,427 Gossiper.java:968 - InetAddress /1.1.1.6 is now DOWN
INFO [GossipTasks:1] 2017-02-10 19:24:33,428 Gossiper.java:968 - InetAddress /1.1.1.7 is now DOWN
...
INFO [MemtableFlushWriter:30358] 2017-02-10 19:18:38,616 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/googleplus/activity_counts-2b2d7da08ec211e6aa074b983626f83b/googleplus-activity_counts-tmp-ka-199677-Data.db (5.208MiB) for commitlog position ReplayPosition(segmentId=1483928416854, position=30283693)
INFO [Service Thread] 2017-02-10 19:23:54,659 GCInspector.java:252 - ParNew GC in 371ms. CMS Old Gen: 3756257272 -> 4008698560; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 99052984 -> 214695936
INFO [Service Thread] 2017-02-10 19:24:03,365 GCInspector.java:252 - ParNew GC in 206ms. CMS Old Gen: 5184283352 -> 5516390704; Par Eden Space: 1718091776 -> 0;
INFO [Service Thread] 2017-02-10 19:24:03,767 GCInspector.java:252 - ParNew GC in 223ms. CMS Old Gen: 5516390704 -> 5851059920; Par Eden Space: 1718091776 -> 0;
INFO [Service Thread] 2017-02-10 19:24:33,346 GCInspector.java:252 - ParNew GC in 21623ms. CMS Old Gen: 5851059920 -> 6084658992;
Number Of Blocks이 굉장히 낮았음에도 불구하고 old gen이 여유가 있음에도 CMS Full GC가 발생했다.
<gc.log>
2017-02-10T19:24:03.543+0900: 2793841.374: [GC (Allocation Failure) Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 61954108
Max Chunk Size: 56091256
Number of Blocks: 281
Av. Block Size: 220477
Tree Height: 24
2793841.375: [ParNew
Desired survivor size 107347968 bytes, new threshold 1 (max 1)
- age 1: 214016272 bytes, 214016272 total
: 1887488K->209664K(1887488K), 0.2227276 secs] 7274588K->5923589K(8178944K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 23251391
Max Chunk Size: 23251391
Number of Blocks: 1
Av. Block Size: 23251391
Tree Height: 1
, 0.2230334 secs] [Times: user=2.82 sys=0.00, real=0.22 secs]
Heap after GC invocations=262308 (full 750):
par new generation total 1887488K, used 209664K [0x00000005c0000000, 0x0000000640000000, 0x0000000640000000)
eden space 1677824K, 0% used [0x00000005c0000000, 0x00000005c0000000, 0x0000000626680000)
from space 209664K, 100% used [0x0000000626680000, 0x0000000633340000, 0x0000000633340000)
to space 209664K, 0% used [0x0000000633340000, 0x0000000633340000, 0x0000000640000000)
concurrent mark-sweep generation total 6291456K, used 5713925K [0x0000000640000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 38684K, capacity 39508K, committed 39740K, reserved 1085440K
class space used 4042K, capacity 4201K, committed 4216K, reserved 1048576K
}
2017-02-10T19:24:03.766+0900: 2793841.598: Total time for which application threads were stopped: 0.2254495 seconds, Stopping threads took: 0.0007439 seconds
{Heap before GC invocations=262308 (full 750):
par new generation total 1887488K, used 1887488K [0x00000005c0000000, 0x0000000640000000, 0x0000000640000000)
eden space 1677824K, 100% used [0x00000005c0000000, 0x0000000626680000, 0x0000000626680000)
from space 209664K, 100% used [0x0000000626680000, 0x0000000633340000, 0x0000000633340000)
to space 209664K, 0% used [0x0000000633340000, 0x0000000633340000, 0x0000000640000000)
concurrent mark-sweep generation total 6291456K, used 5713925K [0x0000000640000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 38684K, capacity 39508K, committed 39740K, reserved 1085440K
class space used 4042K, capacity 4201K, committed 4216K, reserved 1048576K
2017-02-10T19:24:03.966+0900: 2793841.798: [GC (Allocation Failure) Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 23251391
Max Chunk Size: 23251391
Number of Blocks: 1
Av. Block Size: 23251391
Tree Height: 1
2793841.798: [ParNew (0: promotion failure size = 4) (1: promotion failure size = 15) (2: promotion failure size = 14) (3: promotion failure size = 8194) (4: promotion failure size = 15) (5: promotion failure size = 36) (6: promotion failure size = 4) (7: promotion failure size = 14) (8: promotion failure size = 8229) (9: promotion failure size = 14) (10: promotion failure size = 14) (11: promotion failure size = 4) (12: promotion failure size = 35) (13: promotion failure size = 4) (14: promotion failure size = 15) (15: promotion failure size = 4) (16: promotion failure size = 14) (17: promotion failure size = 8194) (promotion failed): 1887488K->1887488K(1887488K), 21.6232257 secs]2793863.422: [CMS2017-02-10T19:24:26.299+0900: 2793864.131: [CMS-concurrent-mark: 2.053/24.445 secs] [Times: user=128.39 sys=13.12, real=24.44 secs]
(concurrent mode failure)CMS: Large block 0x00000006ad3f4960
: 5942049K->1789778K(6291456K), 7.7547756 secs] 7601413K->1789778K(8178944K), [Metaspace: 38684K->38684K(1085440K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 576198356
Max Chunk Size: 576198356
Number of Blocks: 1
Av. Block Size: 576198356
Tree Height: 1
, 29.3783154 secs] [Times: user=106.14 sys=4.60, real=29.38 secs]
Heap after GC invocations=262309 (full 751):
par new generation total 1887488K, used 0K [0x00000005c0000000, 0x0000000640000000, 0x0000000640000000)
eden space 1677824K, 0% used [0x00000005c0000000, 0x00000005c0000000, 0x0000000626680000)
from space 209664K, 0% used [0x0000000633340000, 0x0000000633340000, 0x0000000640000000)
to space 209664K, 0% used [0x0000000626680000, 0x0000000626680000, 0x0000000633340000)
concurrent mark-sweep generation total 6291456K, used 1789778K [0x0000000640000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 38684K, capacity 39508K, committed 39740K, reserved 1085440K
class space used 4042K, capacity 4201K, committed 4216K, reserved 1048576K
}
2017-02-10T19:24:33.345+0900: 2793871.177: Total time for which application threads were stopped: 29.3870599 seconds, Stopping threads took: 0.0070944 seconds
2017-02-10T19:24:33.348+0900: 2793871.180: Total time for which application threads were stopped: 0.0018513 seconds, Stopping threads took: 0.0002945 seconds
2017-02-10T19:24:33.357+0900: 2793871.188: Total time for which application threads were stopped: 0.0062973 seconds, Stopping threads took: 0.0002382 seconds
2017-02-10T19:24:33.375+0900: 2793871.206: Total time for which application threads were stopped: 0.0132915 seconds, Stopping threads took: 0.0113025 seconds
원인
young gen(from, to, eden) 영역의 데이터를 old gen으로 못 올리는 이슈가 있었다.
wide row도 영향이 있는 상태라 판단했다.
# case 2
SNS 서비스의 hotspotting 사용자(친구가 많은 사용자)가 글을 쓸 때 cassandra에서 timeout이 발생하기도 했고 어쩔 때는 한 번에 많은 사용자가 hotspotting 사용자의 데이터를 읽으면서 #case 1처럼 young 영역에서 CMS full gc가 발생했다.
이 문제를 해결하기 위해 gc를 간단히 튜닝했는데, 바로 문제가 해결되었다.
아래는 hotspotting 사용자가 글을 썼을 때 사용자들이 반응하였다. full gc와 full gc로 인한 cpu가 튀는 것을 볼 수 있다.
cassandra의 전체 메모리를 좀 늘리고 (8G -> 12G), -XX:CMSInitiatingOccupancyFraction의 값을 35로 낮췄다.
즉 young gen 영역을 높여서 node repair 이후의 compaction 이슈와 hotspotting 사용자의 데이터 때문에 cassandra가 흔들리지 않도록 했다. 아래와 같이 특정 사용자가 글을 써도 크게 영향을 주지 않음을 알 수 있었다.
그리고 -XX:CMSWaitDuration=10000를 -XX:CMSWaitDuration=100으로 나춰 최대한 gc 시간으로 인해서 튀는 일이 없도록 했다.
어떤 상황이 와도 full gc는 50~60ms가 나오도록 했다.
더 이상 cassandra는 흔들리지 않고 있다.
최대한 memory를 확보할 뿐 아니라 cassandra 질의할 때 최대한 데이터를 많이 얻는 구조(네트워크 용량 및 데이터를 얻음으로서 jvm 데이터 방지)도 조정하는 것이 좋을 것 같다.
또한 재시작 이후에 node간 copy가 일어나서 부하가 높아질 뿐 아니라, compaction이 발생한다. 특히 wide row 테이블 쪽에서 operationtimeout이 발생할 수 있으니, 유의하기 바란다.
참고로 cassandra 종료 스크립트이다.
nodetool disablegossip
sleep 2
nodetool disablethrift
sleep 2
nodetool drain
sleep 2
jps | grep CassandraDaemon | awk {'print $1'} | xargs kill -9
참고로 cassandra 시작 스크립트이다.
CASSANDRA_CONF=/etc/cassandra/conf cassandra
'cassandra' 카테고리의 다른 글
[cassandra] 설정 수정 주의 (0) | 2017.03.17 |
---|---|
[cassandra] 클라이언트 라이브러리 하위 호환성 지원 (cassandra java driver compatibility) (0) | 2017.02.23 |
[cassandra] node subrange repair 작업 - 범위 단위로 repairing 하기 (0) | 2017.01.31 |
[cassandra] node repair 처리 방식 (0) | 2017.01.31 |
[cassandra] 백업 - nodetool snapshot (0) | 2017.01.31 |