경험상, 통신 서버을 아무리 잘 만들어도 serialization/deserialization에서 성능 저하가 발생한다. 이 부분에 대한 trade off를 늘 고민해야 한다.




최근 트위터에서 스트림 처리 서버 관련 내용을 잘 설명해서 펌질 한다.



http://delivery.acm.org/10.1145/2750000/2742788/p239-kulkarni.pdf?ip=211.56.96.51&id=2742788&acc=TRUSTED&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2EE47D41B086F0CDA3&CFID=743695794&CFTOKEN=58998503&__acm__=1490582784_82cd3df2ca63b6c7de75e31c941cdfac




https://blog.twitter.com/2017/optimizing-twitter-heron



  • Repeated Serialization - A tuple is represented as a collection of plain old Java objects. The Java objects are converted into byte arrays using either Kryo or Java serialization. The byte arrays are again serialized when included in a protocol buffers object used for data exchange between stream managers and Heron instances.
  • Eager Deserialization - The stream manager is responsible for receiving and routing protocol buffer tuples. When it receives a tuple, it eagerly deserializes it into a C++ object.
  • Immutability - To simplify the implementation and reasoning, stream manager does not reuse any protobuf objects. For each message received, it uses the malloc allocator to allocate a protobuf object, which it then releases back to the allocator once the operation is completed. Instead of modifying the protobuf in place, it copies the contents to a newly allocated message, makes the modification on the new message and releases the old one.




  • ~17% of the CPU is used to create/delete a protobuf object from memory allocator (not including those protobuf objects allocated on stack).
  • ~15% of the CPU is used to copy a new protobuf object instead of updating one in place.
  • ~18% of the CPU is used to eagerly deserialize a protobuf message, despite the fact that eager deserialization is not needed; instead we could just handle the byte array.




리팩토링 한 부분


  • Added a separate memory pool for each type of protobuf message thereby reducing the expensive cost to create/delete a protobuf message.
  • Changed an internal data structure that caches tuples from std::list to std::deque to facilitate preallocation of protobuf messages.
  • Optimized away the code that was duplicating the protobuf message to do in-place update whenever possible.
  • When a stream manager receives a message from another stream manager, instead of eagerly deserializing the inner tuple message, it now transfers the underlying serialized byte array directly to the instance.



Posted by 김용환 '김용환'

댓글을 달아 주세요