경험상, 통신 서버을 아무리 잘 만들어도 serialization/deserialization에서 성능 저하가 발생한다. 이 부분에 대한 trade off를 늘 고민해야 한다.




최근 트위터에서 스트림 처리 서버 관련 내용을 잘 설명해서 펌질 한다.



http://delivery.acm.org/10.1145/2750000/2742788/p239-kulkarni.pdf?ip=211.56.96.51&id=2742788&acc=TRUSTED&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2EE47D41B086F0CDA3&CFID=743695794&CFTOKEN=58998503&__acm__=1490582784_82cd3df2ca63b6c7de75e31c941cdfac




https://blog.twitter.com/2017/optimizing-twitter-heron



  • Repeated Serialization - A tuple is represented as a collection of plain old Java objects. The Java objects are converted into byte arrays using either Kryo or Java serialization. The byte arrays are again serialized when included in a protocol buffers object used for data exchange between stream managers and Heron instances.
  • Eager Deserialization - The stream manager is responsible for receiving and routing protocol buffer tuples. When it receives a tuple, it eagerly deserializes it into a C++ object.
  • Immutability - To simplify the implementation and reasoning, stream manager does not reuse any protobuf objects. For each message received, it uses the malloc allocator to allocate a protobuf object, which it then releases back to the allocator once the operation is completed. Instead of modifying the protobuf in place, it copies the contents to a newly allocated message, makes the modification on the new message and releases the old one.




  • ~17% of the CPU is used to create/delete a protobuf object from memory allocator (not including those protobuf objects allocated on stack).
  • ~15% of the CPU is used to copy a new protobuf object instead of updating one in place.
  • ~18% of the CPU is used to eagerly deserialize a protobuf message, despite the fact that eager deserialization is not needed; instead we could just handle the byte array.




리팩토링 한 부분


  • Added a separate memory pool for each type of protobuf message thereby reducing the expensive cost to create/delete a protobuf message.
  • Changed an internal data structure that caches tuples from std::list to std::deque to facilitate preallocation of protobuf messages.
  • Optimized away the code that was duplicating the protobuf message to do in-place update whenever possible.
  • When a stream manager receives a message from another stream manager, instead of eagerly deserializing the inner tuple message, it now transfers the underlying serialized byte array directly to the instance.



Posted by '김용환'
,