'2017/03/02 글 목록

kafka 0.8과 kafka 0.10 의 Partitioner 변경 내용

kafka 2017. 3. 2. 17:17

org.apache.kafka.clients.producer.Partitioner이 0.10.1.1버전의 다음처럼 변경되었다.

kafka 0.8

public int partition(Object key, int partitions)

kafka 0.10.1.1

public interface Partitioner extends Configurable {

/**

* Compute the partition for the given record.

*

* @param topic The topic name

* @param key The key to partition on (or null if no key)

* @param keyBytes The serialized key to partition on( or null if no key)

* @param value The value to partition on or null

* @param valueBytes The serialized value to partition on or null

* @param cluster The current cluster metadata

*/

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster);

/**

* This is called when partitioner is closed.

*/

public void close();

}

0.8에서는 partitoning을 다음처럼 진행했었다.

public int partition(Object key, int partitions) {

...

return Integer.valueOf(key) % partitions;

}

partition 개수를 얻어오려면 Cluster를 통해서 partition을 얻도록 한다.

@Override

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {

List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);

int partitionsCount = partitions.size();

int id = (Integer) key;

if (id < 0) {

return random.nextInt(partitionsCount);

}

int partitionId = id % partitionsCount;

return partitionId;

}

저작자표시

'kafka' 카테고리의 다른 글

[kafka] 0.10.1.1 사용하면서 api 사용시 ProducerConfig와 ConsumerConfig를 잘 참조한다 (0)	2017.03.16
[kafka] 구축 사례 - linkedin, uber, twitter, spotify, yahoo (0)	2017.03.13
[kafka] kafka 0.10.1.1 - producer / consumer 중요 내용 (0)	2017.03.08
[kafka] 0.10.1.1 설치, topic 생성/삭제/수정, 메시지 송신/발신 테스트 (2)	2017.02.28
[kafka] kafka.common.InconsistentBrokerIdException: Configured broker.id 2 doesn't match stored broker.id 0 in meta.properties 해결 하기 (0)	2017.02.28

Posted by '김용환'

,

[spring] schedule 어노테이션을 사용해 똑같은 주기가 되지 않도록 하기

general java 2017. 3. 2. 17:06

spring에는 cron 표현식을 사용하지만, 너무 똑같은 시간(0초)에 동작되지 않게 할 수 있다.

spring 3.x에는 어노테이션에서 spel을 지원하지 않아 XML로 설정해야 한다.

<beans xmlns="http://www.springframework.org/schema/beans"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:task="http://www.springframework.org/schema/task"

xsi:schemaLocation="

http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd

http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task.xsd">

<task:executor id="googleExecutor" pool-size="5-30" queue-capacity="10000"/>

<task:annotation-driven scheduler="googleScheduler" executor="googleExecutor"/>

<task:scheduler id="googleScheduler" pool-size="30"/>

<task:scheduled-tasks scheduler="googleScheduler">

<task:scheduled ref="googleFeedService" method="reloadCache" fixed-delay="#{new Double(T(java.lang.Math).random()*3000).intValue() + 600000}"/>

</task:scheduled-tasks>

</beans>

반면, spring 4.3에서 어노테이션을 사용하면 다음과 같이 쉽게 사용할 수 있다.

@Scheduled(fixedRate = 600000, initialDelayString = #{ T(java.lang.Math).random() * 3000} )

저작자표시

'general java' 카테고리의 다른 글

[spring] @Scheduled-fixedDelayString 예시와 annot parse into integer 문제 해결 (0)	2017.03.11
[java] byte[] 쉽게 만들기 (0)	2017.03.03
logback 설정의 encoder (0)	2017.02.09
nitialize Unable to obtain CGLib fast class and/or method implementation 해결하기 (0)	2017.01.14
ObjectMapper, UnrecognizedPropertyException, JsonInclude 예시 (0)	2016.12.17

Posted by '김용환'

,

[펌] spark 2.0 소개(성능)

scala 2017. 3. 2. 11:07

https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.html

좋은 spark 2.0 소개 자료가 있다.

성능이 월등히 좋아졌다.

primitive	Spark 1.6	Spark 2.0
filter	15ns	1.1ns
sum w/o group	14ns	0.9ns
sum w/ group	79ns	10.7ns
hash join	115ns	4.0ns
sort (8-bit entropy)	620ns	5.3ns
sort (64-bit entropy)	620ns	40ns
sort-merge join	750ns	700ns

API가 좋아졌다.

DataFrame이 쓰기 편해졌고, HiveContext(SQLContext) 대신 SparkSession이 새로 추가되었다.

Unifying DataFrames and Datasets in Scala/Java: Starting in Spark 2.0, DataFrame is just a type alias for Dataset of Row. Both the typed methods (e.g. map, filter, groupByKey) and the untyped methods (e.g. select, groupBy) are available on the Dataset class. Also, this new combined Dataset interface is the abstraction used for Structured Streaming. Since compile-time type-safety in Python and R is not a language feature, the concept of Dataset does not apply to these languages’ APIs. Instead, DataFrame remains the primary programing abstraction, which is analogous to the single-node data frame notion in these languages. Get a peek from a Dataset API notebook.
SparkSession: a new entry point that replaces the old SQLContext and HiveContext. For users of the DataFrame API, a common source of confusion for Spark is which “context” to use. Now you can use SparkSession, which subsumes both, as a single entry point, as demonstrated in this notebook. Note that the old SQLContext and HiveContext are still kept for backward compatibility.
Simpler, more performant Accumulator API: We have designed a new Accumulator API that has a simpler type hierarchy and support specialization for primitive types. The old Accumulator API has been deprecated but retained for backward compatibility
DataFrame-based Machine Learning API emerges as the primary ML API: With Spark 2.0, the spark.ml package, with its “pipeline” APIs, will emerge as the primary machine learning API. While the original spark.mllib package is preserved, future development will focus on the DataFrame-based API.
Machine learning pipeline persistence: Users can now save and load machine learning pipelines and models across all programming languages supported by Spark.
Distributed algorithms in R: Added support for Generalized Linear Models (GLM), Naive Bayes, Survival Regression, and K-Means in R.

저작자표시

'scala' 카테고리의 다른 글

[scala] scala.collection.GenTraversableOnce[?] 해결 하기 (0)	2017.03.07
[scala] List와 Array의 lift 메소드 (0)	2017.03.04
[spark] spark의 OutOfMemoryError 발생시 (0)	2017.02.24
[spark] spark summit 자료 (0)	2017.02.22
[scala] Array.transpose 예시 (0)	2017.02.17

Posted by '김용환'

,

'2017/03/02'에 해당되는 글 3건

kafka 0.8과 kafka 0.10 의 Partitioner 변경 내용

'kafka' 카테고리의 다른 글

[spring] schedule 어노테이션을 사용해 똑같은 주기가 되지 않도록 하기

'general java' 카테고리의 다른 글

[펌] spark 2.0 소개(성능)

'scala' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바