cassandra를 운영하면서 CMS Full GC가 발생하면서 수십 초동안 stop the world (hang)이 발생했다. 이를 간단히 해결한 내용을 작성한다. 





Case #1

node repair를 하고 난 직후 바로 다른 table(colum family)에서 compaction이 일어났고 CMS Full GC가 발생했다. 

수십 초동안 stop the world (hang)이 발생했다. 



<system.log>


INFO  [MemtableFlushWriter:30358] 2017-02-10 19:18:38,616 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/googleplus/activity_counts-2b2d7da08ec211e6aa074b983626f83b/googleplus-activity_counts-tmp-ka-199677-Data.db (5.208MiB) for commitlog position ReplayPosition(segmentId=1483928416854, position=30283693)

INFO  [Service Thread] 2017-02-10 19:23:54,659 GCInspector.java:252 - ParNew GC in 371ms.  CMS Old Gen: 3756257272 -> 4008698560; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 99052984 -> 214695936

INFO  [Service Thread] 2017-02-10 19:24:03,365 GCInspector.java:252 - ParNew GC in 206ms.  CMS Old Gen: 5184283352 -> 5516390704; Par Eden Space: 1718091776 -> 0;

INFO  [Service Thread] 2017-02-10 19:24:03,767 GCInspector.java:252 - ParNew GC in 223ms.  CMS Old Gen: 5516390704 -> 5851059920; Par Eden Space: 1718091776 -> 0;

INFO  [Service Thread] 2017-02-10 19:24:33,346 GCInspector.java:252 - ParNew GC in 21623ms.  CMS Old Gen: 5851059920 -> 6084658992;

INFO  [Service Thread] 2017-02-10 19:24:33,346 StatusLogger.java:51 - Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked

INFO  [Service Thread] 2017-02-10 19:24:33,385 StatusLogger.java:66 - MutationStage                    96       837     2082221481         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,388 StatusLogger.java:66 - RequestResponseStage              0         0     3415222266         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,388 StatusLogger.java:66 - ReadRepairStage                   0         0      129691312         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,388 StatusLogger.java:66 - CounterMutationStage             14         0       61697119         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,388 StatusLogger.java:66 - ReadStage                        40        23     1525495767         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - MiscStage                         0         0              0         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - AntiEntropySessions               0         0          30954         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - HintedHandoff                     0         1           4704         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - GossipStage                       0         0        9002749         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - CacheCleanupExecutor              0         0              0         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - InternalResponseStage             0         0          27677         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,389 StatusLogger.java:66 - CommitLogArchiver                 0         0              0         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,390 StatusLogger.java:66 - CompactionExecutor                2        65        3178052         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,390 StatusLogger.java:66 - ValidationExecutor                0         0         772255         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,390 StatusLogger.java:66 - MigrationStage                    0         0              7         0                 0

INFO  [Service Thread] 2017-02-10 19:24:33,390 StatusLogger.java:66 - AntiEntropyStage                  0         0        6186269         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,357 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,396 StatusLogger.java:51 - Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,396 StatusLogger.java:66 - MutationStage                    96      2022     2082221481         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,396 StatusLogger.java:66 - RequestResponseStage              0         0     3415222266         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,396 StatusLogger.java:66 - ReadRepairStage                   0         0      129691312         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - CounterMutationStage             14         0       61697119         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - ReadStage                        43       329     1525495974         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - MiscStage                         0         0              0         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - AntiEntropySessions               0         0          30954         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - HintedHandoff                     0         1           4704         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,397 StatusLogger.java:66 - GossipStage                       0         0        9002749         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - CacheCleanupExecutor              0         0              0         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - InternalResponseStage             0         0          27677         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - CommitLogArchiver                 0         0              0         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - CompactionExecutor                2        65        3178052         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - ValidationExecutor                0         0         772255         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,398 StatusLogger.java:66 - MigrationStage                    0         0              7         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - AntiEntropyStage                  0         0        6186269         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - PendingRangeCalculator            0         0             11         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - Sampler                           0         0              0         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - MemtableFlushWriter               0         0         284744         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - MemtablePostFlush                 1         1         969941         0                 0

INFO  [ScheduledTasks:1] 2017-02-10 19:24:33,399 StatusLogger.java:66 - MemtableReclaimMemory             0         0         361430         0                 0

WARN  [GossipTasks:1] 2017-02-10 19:24:33,423 FailureDetector.java:249 - Not marking nodes down due to local pause of 30438450773 > 5000000000

INFO  [GossipTasks:1] 2017-02-10 19:24:33,423 Gossiper.java:968 - InetAddress /1.1.1.1 is now DOWN

INFO  [GossipTasks:1] 2017-02-10 19:24:33,426 Gossiper.java:968 - InetAddress /1.1.1.2 is now DOWN

INFO  [GossipTasks:1] 2017-02-10 19:24:33,427 Gossiper.java:968 - InetAddress /1.1.1.3 is now DOWN

INFO  [GossipTasks:1] 2017-02-10 19:24:33,427 Gossiper.java:968 - InetAddress /1.1.1.4 is now DOWN

INFO  [GossipTasks:1] 2017-02-10 19:24:33,427 Gossiper.java:968 - InetAddress /1.1.1.5 is now DOWN

INFO  [GossipTasks:1] 2017-02-10 19:24:33,427 Gossiper.java:968 - InetAddress /1.1.1.6 is now DOWN

INFO  [GossipTasks:1] 2017-02-10 19:24:33,428 Gossiper.java:968 - InetAddress /1.1.1.7 is now DOWN

...





INFO  [MemtableFlushWriter:30358] 2017-02-10 19:18:38,616 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/googleplus/activity_counts-2b2d7da08ec211e6aa074b983626f83b/googleplus-activity_counts-tmp-ka-199677-Data.db (5.208MiB) for commitlog position ReplayPosition(segmentId=1483928416854, position=30283693)

INFO  [Service Thread] 2017-02-10 19:23:54,659 GCInspector.java:252 - ParNew GC in 371ms.  CMS Old Gen: 3756257272 -> 4008698560; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 99052984 -> 214695936

INFO  [Service Thread] 2017-02-10 19:24:03,365 GCInspector.java:252 - ParNew GC in 206ms.  CMS Old Gen: 5184283352 -> 5516390704; Par Eden Space: 1718091776 -> 0;

INFO  [Service Thread] 2017-02-10 19:24:03,767 GCInspector.java:252 - ParNew GC in 223ms.  CMS Old Gen: 5516390704 -> 5851059920; Par Eden Space: 1718091776 -> 0;

INFO  [Service Thread] 2017-02-10 19:24:33,346 GCInspector.java:252 - ParNew GC in 21623ms.  CMS Old Gen: 5851059920 -> 6084658992;







Number Of Blocks이 굉장히 낮았음에도 불구하고 old gen이 여유가 있음에도 CMS Full GC가 발생했다.



<gc.log>


2017-02-10T19:24:03.543+0900: 2793841.374: [GC (Allocation Failure) Before GC:

Statistics for BinaryTreeDictionary:

------------------------------------

Total Free Space: 61954108

Max   Chunk Size: 56091256

Number of Blocks: 281

Av.  Block  Size: 220477

Tree      Height: 24

2793841.375: [ParNew

Desired survivor size 107347968 bytes, new threshold 1 (max 1)

- age   1:  214016272 bytes,  214016272 total

: 1887488K->209664K(1887488K), 0.2227276 secs] 7274588K->5923589K(8178944K)After GC:

Statistics for BinaryTreeDictionary:

------------------------------------

Total Free Space: 23251391

Max   Chunk Size: 23251391

Number of Blocks: 1

Av.  Block  Size: 23251391

Tree      Height: 1

, 0.2230334 secs] [Times: user=2.82 sys=0.00, real=0.22 secs]

Heap after GC invocations=262308 (full 750):

 par new generation   total 1887488K, used 209664K [0x00000005c0000000, 0x0000000640000000, 0x0000000640000000)

  eden space 1677824K,   0% used [0x00000005c0000000, 0x00000005c0000000, 0x0000000626680000)

  from space 209664K, 100% used [0x0000000626680000, 0x0000000633340000, 0x0000000633340000)

  to   space 209664K,   0% used [0x0000000633340000, 0x0000000633340000, 0x0000000640000000)

 concurrent mark-sweep generation total 6291456K, used 5713925K [0x0000000640000000, 0x00000007c0000000, 0x00000007c0000000)

 Metaspace       used 38684K, capacity 39508K, committed 39740K, reserved 1085440K

  class space    used 4042K, capacity 4201K, committed 4216K, reserved 1048576K

}




2017-02-10T19:24:03.766+0900: 2793841.598: Total time for which application threads were stopped: 0.2254495 seconds, Stopping threads took: 0.0007439 seconds

{Heap before GC invocations=262308 (full 750):

 par new generation   total 1887488K, used 1887488K [0x00000005c0000000, 0x0000000640000000, 0x0000000640000000)

  eden space 1677824K, 100% used [0x00000005c0000000, 0x0000000626680000, 0x0000000626680000)

  from space 209664K, 100% used [0x0000000626680000, 0x0000000633340000, 0x0000000633340000)

  to   space 209664K,   0% used [0x0000000633340000, 0x0000000633340000, 0x0000000640000000)

 concurrent mark-sweep generation total 6291456K, used 5713925K [0x0000000640000000, 0x00000007c0000000, 0x00000007c0000000)

 Metaspace       used 38684K, capacity 39508K, committed 39740K, reserved 1085440K

  class space    used 4042K, capacity 4201K, committed 4216K, reserved 1048576K

2017-02-10T19:24:03.966+0900: 2793841.798: [GC (Allocation Failure) Before GC:

Statistics for BinaryTreeDictionary:

------------------------------------

Total Free Space: 23251391

Max   Chunk Size: 23251391

Number of Blocks: 1

Av.  Block  Size: 23251391

Tree      Height: 1

2793841.798: [ParNew (0: promotion failure size = 4)  (1: promotion failure size = 15)  (2: promotion failure size = 14)  (3: promotion failure size = 8194)  (4: promotion failure size = 15)  (5: promotion failure size = 36)  (6: promotion failure size = 4)  (7: promotion failure size = 14)  (8: promotion failure size = 8229)  (9: promotion failure size = 14)  (10: promotion failure size = 14)  (11: promotion failure size = 4)  (12: promotion failure size = 35)  (13: promotion failure size = 4)  (14: promotion failure size = 15)  (15: promotion failure size = 4)  (16: promotion failure size = 14)  (17: promotion failure size = 8194)  (promotion failed): 1887488K->1887488K(1887488K), 21.6232257 secs]2793863.422: [CMS2017-02-10T19:24:26.299+0900: 2793864.131: [CMS-concurrent-mark: 2.053/24.445 secs] [Times: user=128.39 sys=13.12, real=24.44 secs]

 (concurrent mode failure)CMS: Large block 0x00000006ad3f4960

: 5942049K->1789778K(6291456K), 7.7547756 secs] 7601413K->1789778K(8178944K), [Metaspace: 38684K->38684K(1085440K)]After GC:

Statistics for BinaryTreeDictionary:

------------------------------------

Total Free Space: 576198356

Max   Chunk Size: 576198356

Number of Blocks: 1

Av.  Block  Size: 576198356

Tree      Height: 1

, 29.3783154 secs] [Times: user=106.14 sys=4.60, real=29.38 secs]

Heap after GC invocations=262309 (full 751):

 par new generation   total 1887488K, used 0K [0x00000005c0000000, 0x0000000640000000, 0x0000000640000000)

  eden space 1677824K,   0% used [0x00000005c0000000, 0x00000005c0000000, 0x0000000626680000)

  from space 209664K,   0% used [0x0000000633340000, 0x0000000633340000, 0x0000000640000000)

  to   space 209664K,   0% used [0x0000000626680000, 0x0000000626680000, 0x0000000633340000)

 concurrent mark-sweep generation total 6291456K, used 1789778K [0x0000000640000000, 0x00000007c0000000, 0x00000007c0000000)

 Metaspace       used 38684K, capacity 39508K, committed 39740K, reserved 1085440K

  class space    used 4042K, capacity 4201K, committed 4216K, reserved 1048576K

}

2017-02-10T19:24:33.345+0900: 2793871.177: Total time for which application threads were stopped: 29.3870599 seconds, Stopping threads took: 0.0070944 seconds

2017-02-10T19:24:33.348+0900: 2793871.180: Total time for which application threads were stopped: 0.0018513 seconds, Stopping threads took: 0.0002945 seconds

2017-02-10T19:24:33.357+0900: 2793871.188: Total time for which application threads were stopped: 0.0062973 seconds, Stopping threads took: 0.0002382 seconds

2017-02-10T19:24:33.375+0900: 2793871.206: Total time for which application threads were stopped: 0.0132915 seconds, Stopping threads took: 0.0113025 seconds




원인

young gen(from, to, eden) 영역의 데이터를 old gen으로 못 올리는 이슈가 있었다. 

wide row도 영향이 있는 상태라 판단했다. 



# case 2


SNS 서비스의 hotspotting 사용자(친구가 많은 사용자)가 글을 쓸 때 cassandra에서 timeout이 발생하기도 했고 어쩔 때는 한 번에 많은 사용자가 hotspotting 사용자의 데이터를 읽으면서 #case 1처럼 young 영역에서 CMS full gc가 발생했다. 


이 문제를 해결하기 위해 gc를 간단히 튜닝했는데, 바로 문제가 해결되었다.


아래는 hotspotting 사용자가 글을 썼을 때 사용자들이 반응하였다. full gc와 full gc로 인한 cpu가 튀는 것을 볼 수 있다. 





cassandra의 전체 메모리를 좀 늘리고 (8G -> 12G), -XX:CMSInitiatingOccupancyFraction의 값을 35로 낮췄다. 

즉 young gen 영역을 높여서 node repair 이후의 compaction 이슈와 hotspotting 사용자의 데이터 때문에 cassandra가 흔들리지 않도록 했다. 아래와 같이 특정 사용자가 글을 써도 크게 영향을 주지 않음을 알 수 있었다. 

그리고 -XX:CMSWaitDuration=10000를 -XX:CMSWaitDuration=100으로 나춰 최대한 gc 시간으로 인해서 튀는 일이 없도록 했다.


어떤 상황이 와도 full gc는 50~60ms가 나오도록 했다.


더 이상 cassandra는 흔들리지 않고 있다. 





최대한 memory를 확보할 뿐 아니라 cassandra 질의할 때 최대한 데이터를 많이 얻는 구조(네트워크 용량 및 데이터를 얻음으로서 jvm 데이터 방지)도 조정하는 것이 좋을 것 같다. 



또한 재시작 이후에 node간 copy가 일어나서 부하가 높아질 뿐 아니라, compaction이 발생한다. 특히 wide row 테이블 쪽에서 operationtimeout이 발생할 수 있으니, 유의하기 바란다.




참고로 cassandra 종료 스크립트이다.


nodetool disablegossip

sleep 2

nodetool disablethrift

sleep 2

nodetool drain

sleep 2

jps | grep CassandraDaemon | awk {'print $1'} | xargs kill -9





참고로 cassandra 시작 스크립트이다.


CASSANDRA_CONF=/etc/cassandra/conf cassandra

Posted by '김용환'
,