[펌] 트위터 스트림 처리 서버 (heron) 최적화 내용

scribbling 2017. 3. 27. 11:42

경험상, 통신 서버을 아무리 잘 만들어도 serialization/deserialization에서 성능 저하가 발생한다. 이 부분에 대한 trade off를 늘 고민해야 한다.

최근 트위터에서 스트림 처리 서버 관련 내용을 잘 설명해서 펌질 한다.

Repeated Serialization - A tuple is represented as a collection of plain old Java objects. The Java objects are converted into byte arrays using either Kryo or Java serialization. The byte arrays are again serialized when included in a protocol buffers object used for data exchange between stream managers and Heron instances.
Eager Deserialization - The stream manager is responsible for receiving and routing protocol buffer tuples. When it receives a tuple, it eagerly deserializes it into a C++ object.
Immutability - To simplify the implementation and reasoning, stream manager does not reuse any protobuf objects. For each message received, it uses the malloc allocator to allocate a protobuf object, which it then releases back to the allocator once the operation is completed. Instead of modifying the protobuf in place, it copies the contents to a newly allocated message, makes the modification on the new message and releases the old one.

~17% of the CPU is used to create/delete a protobuf object from memory allocator (not including those protobuf objects allocated on stack).
~15% of the CPU is used to copy a new protobuf object instead of updating one in place.
~18% of the CPU is used to eagerly deserialize a protobuf message, despite the fact that eager deserialization is not needed; instead we could just handle the byte array.

리팩토링 한 부분

Added a separate memory pool for each type of protobuf message thereby reducing the expensive cost to create/delete a protobuf message.
Changed an internal data structure that caches tuples from std::list to std::deque to facilitate preallocation of protobuf messages.
Optimized away the code that was duplicating the protobuf message to do in-place update whenever possible.
When a stream manager receives a message from another stream manager, instead of eagerly deserializing the inner tuple message, it now transfers the underlying serialized byte array directly to the instance.

[펌] 메소스 - 컨테이너 자료 (0)	2017.04.07
메소스(mesos) 공부 (0)	2017.03.27
[펌] 카산드라는 매달 출시 계획을 갖는다. (0)	2017.03.21
머신러닝/딥러닝 링크 #1 (0)	2017.02.15
링크 공유 - 대기 오염 상태 보는 싸이트 - earth.nullschool.net (0)	2017.01.19

Posted by '김용환'

티스토리툴바