'2016/12/07 글 목록

'2016/12/07'에 해당되는 글 3건

2016.12.07 [cassandra] select count(*) 구하기
2016.12.07 guava의 hash function과 redis의 hash function (murmur)
2016.12.07 [capistrano] 다른 task 호출하기

[cassandra] select count(*) 구하기

cassandra 2016. 12. 7. 20:28

cassandra에서 테이블의 row 개수를 구하려면, 다음과 같은 cql을 사용할 수 있다.

select count(*) from table

하지만, 대용량 데이터가 존재한다면, timeout이 발생한다.

이를 위해 timeout를 설정할 수 있지만, 성능 이슈가 발생할 수 있으니..

cqlsh --request-timeout="60"이라고 지정할 수 있다.

https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlsh.html

--request-timeout="timeout" CQL request timeout in seconds; default: 10

하지만, 문제는 성능 이슈이다..

(서비스에서 매번 카운트를 불러 읽는 경우라면 따로 cassandra counter를 이용해 구현하는 것이 좋다. hbase나 cassandra에 카운트 계산을 매번 호출하는 것은 위험한 작업이다!!)

대용량 데이터의 row 개수를 구할 수 있는 또 다른 방법은 nodetool을 이용하는 것이다.

nodetool cfstat를 사용해서 테이블의 Number of keys(estimate)를 확인하면 대략적인 내용을 확인할 수 있다.

$./nodetool cfstat

Table (index): table

Number of keys (estimate): 184251

Table: table

Number of keys (estimate): 538971

추정치 값은 아래 cassandra 코드를 따라 들어가 확인할 수 있다.

데이터 스트림 크기를 기반으로 hyperloglog 계산을 이용한 추정치이기 때문에 신뢰할만하다.

https://github.com/apache/cassandra/blob/42e0fc5ee221950875d93b4cd007d4f5bcaa4244/src/java/org/apache/cassandra/tools/nodetool/stats/TableStatsHolder.java

Object estimatedPartitionCount = probe.getColumnFamilyMetric(keyspaceName, tableName, "EstimatedPartitionCount");

if (Long.valueOf(-1L).equals(estimatedPartitionCount))

{

estimatedPartitionCount = 0L;

}

statsTable.numberOfKeysEstimate = estimatedPartitionCount;

https://github.com/apache/cassandra/blob/979af884ee4ecef78a21c4bd58992d053256f8f0/src/java/org/apache/cassandra/tools/NodeProbe.java

/**

* Retrieve ColumnFamily metrics

* @param ks Keyspace for which stats are to be displayed or null for the global value

* @param cf ColumnFamily for which stats are to be displayed or null for the keyspace value (if ks supplied)

* @param metricName View {@link TableMetrics}.

public Object getColumnFamilyMetric(String ks, String cf, String metricName)

{

try

{

ObjectName oName = null;

if (!Strings.isNullOrEmpty(ks) && !Strings.isNullOrEmpty(cf))

{

String type = cf.contains(".") ? "IndexTable" : "Table";

oName = new ObjectName(String.format("org.apache.cassandra.metrics:type=%s,keyspace=%s,scope=%s,name=%s", type, ks, cf, metricName));

}

else if (!Strings.isNullOrEmpty(ks))

{

oName = new ObjectName(String.format("org.apache.cassandra.metrics:type=Keyspace,keyspace=%s,name=%s", ks, metricName));

}

else

{

oName = new ObjectName(String.format("org.apache.cassandra.metrics:type=Table,name=%s", metricName));

}

switch(metricName)

{

case "BloomFilterDiskSpaceUsed":

case "BloomFilterFalsePositives":

case "BloomFilterFalseRatio":

case "BloomFilterOffHeapMemoryUsed":

case "IndexSummaryOffHeapMemoryUsed":

case "CompressionMetadataOffHeapMemoryUsed":

case "CompressionRatio":

case "EstimatedColumnCountHistogram":

case "EstimatedPartitionSizeHistogram":

case "EstimatedPartitionCount":

case "KeyCacheHitRate":

case "LiveSSTableCount":

case "MaxPartitionSize":

case "MeanPartitionSize":

case "MemtableColumnsCount":

case "MemtableLiveDataSize":

case "MemtableOffHeapSize":

case "MinPartitionSize":

case "PercentRepaired":

case "RecentBloomFilterFalsePositives":

case "RecentBloomFilterFalseRatio":

case "SnapshotsSize":

return JMX.newMBeanProxy(mbeanServerConn, oName, CassandraMetricsRegistry.JmxGaugeMBean.class).getValue();

https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/metrics/TableMetrics.java

public class TableMetrics

{

/** Approximate number of keys in table. */

public final Gauge<Long> estimatedPartitionCount;

estimatedPartitionCount = Metrics.register(factory.createMetricName("EstimatedPartitionCount"),

aliasFactory.createMetricName("EstimatedRowCount"),

new Gauge<Long>()

{

public Long getValue()

{

long memtablePartitions = 0;

for (Memtable memtable : cfs.getTracker().getView().getAllMemtables())

memtablePartitions += memtable.partitionCount();

return SSTableReader.getApproximateKeyCount(cfs.getSSTables(SSTableSet.CANONICAL)) + memtablePartitions;

}

});

https://github.com/apache/cassandra/blob/4a2464192e9e69457f5a5ecf26c094f9298bf069/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java

/**

* Calculate approximate key count.

* If cardinality estimator is available on all given sstables, then this method use them to estimate

* key count.

* If not, then this uses index summaries.

* @param sstables SSTables to calculate key count

* @return estimated key count

public static long getApproximateKeyCount(Iterable<SSTableReader> sstables)

{

long count = -1;

if (Iterables.isEmpty(sstables))

return count;

boolean failed = false;

ICardinality cardinality = null;

for (SSTableReader sstable : sstables)

{

if (sstable.openReason == OpenReason.EARLY)

continue;

try

{

CompactionMetadata metadata = (CompactionMetadata) sstable.descriptor.getMetadataSerializer().deserialize(sstable.descriptor, MetadataType.COMPACTION);

// If we can't load the CompactionMetadata, we are forced to estimate the keys using the index

// summary. (CASSANDRA-10676)

if (metadata == null)

{

logger.warn("Reading cardinality from Statistics.db failed for {}", sstable.getFilename());

failed = true;

break;

}

if (cardinality == null)

cardinality = metadata.cardinalityEstimator;

else

cardinality = cardinality.merge(metadata.cardinalityEstimator);

}

catch (IOException e)

{

logger.warn("Reading cardinality from Statistics.db failed.", e);

failed = true;

break;

}

catch (CardinalityMergeException e)

{

logger.warn("Cardinality merge failed.", e);

failed = true;

break;

}

if (cardinality != null && !failed)

count = cardinality.cardinality();

// if something went wrong above or cardinality is not available, calculate using index summary

if (count < 0)

{

for (SSTableReader sstable : sstables)

count += sstable.estimatedKeys();

}

return count;

}

https://github.com/apache/cassandra/blob/4a2464192e9e69457f5a5ecf26c094f9298bf069/src/java/org/apache/cassandra/io/sstable/metadata/CompactionMetadata.java

/**

* Compaction related SSTable metadata.

* Only loaded for <b>compacting</b> SSTables at the time of compaction.

public class CompactionMetadata extends MetadataComponent

{

public static final IMetadataComponentSerializer serializer = new CompactionMetadataSerializer();

public final ICardinality cardinalityEstimator;

public CompactionMetadata(ICardinality cardinalityEstimator)

{

this.cardinalityEstimator = cardinalityEstimator;

}

.,...

public static class CompactionMetadataSerializer implements IMetadataComponentSerializer<CompactionMetadata>

{

public int serializedSize(Version version, CompactionMetadata component) throws IOException

{

int sz = 0;

byte[] serializedCardinality = component.cardinalityEstimator.getBytes();

return TypeSizes.sizeof(serializedCardinality.length) + serializedCardinality.length + sz;

}

public void serialize(Version version, CompactionMetadata component, DataOutputPlus out) throws IOException

{

ByteBufferUtil.writeWithLength(component.cardinalityEstimator.getBytes(), out);

}

public CompactionMetadata deserialize(Version version, DataInputPlus in) throws IOException

{

ICardinality cardinality = HyperLogLogPlus.Builder.build(ByteBufferUtil.readBytes(in, in.readInt()));

return new CompactionMetadata(cardinality);

}

저작자표시 (새창열림)

'cassandra' 카테고리의 다른 글

[cassandra] cassandra 서버를 안전하게 재시작하기 (0)	2017.01.09
[cassadra] compaction 전략 (0)	2016.12.09
[cassandra] read repair (0)	2016.11.23
[cassandra] cqlsh 팁 (0)	2016.11.21
[cassandra] counter 테이블 예시 및 유의 사항 (0)	2016.11.17

Posted by '김용환'

guava의 hash function과 redis의 hash function (murmur)

general java 2016. 12. 7. 17:32

google의 guava의 hash 함수가 존재한다.

패키지명은 com.google.common.hash.Hashing이다.

아주 친절하게도 http://goo.gl/jS7HH에 잘 저장되어 있어서 어느 것을 선택할 지 알려주고 있다!!

adler32, crc32,sha256 등등의 경우는 추천하지 않고 있다. Not stable것도 있으니 노트를 잘 보고 용도에 맞게 써야 할 것 같다.

예를 들어, Hashing.murmur3_32는 다음 메소드를 가지고 있다.

/**

* Returns a hash function implementing the

* <a href="http://smhasher.googlecode.com/svn/trunk/MurmurHash3.cpp">

* 32-bit murmur3 algorithm, x86 variant</a> (little-endian variant),

* using the given seed value.

* <p>The exact C++ equivalent is the MurmurHash3_x86_32 function (Murmur3A).

public static HashFunction murmur3_32(int seed) {

return new Murmur3_32HashFunction(seed);

}

/**

* Returns a hash function implementing the

* <a href="http://smhasher.googlecode.com/svn/trunk/MurmurHash3.cpp">

* 32-bit murmur3 algorithm, x86 variant</a> (little-endian variant),

* using a seed value of zero.

* <p>The exact C++ equivalent is the MurmurHash3_x86_32 function (Murmur3A).

public static HashFunction murmur3_32() {

return Murmur3_32Holder.MURMUR3_32;

}

다음과 같이 사용할 수 있다.

com.google.common.hash.HashFunction algorithm = com.google.common.hash.Hashing.murmur3_32(0);

long b = algorithm.hashBytes("google:plus:201612123:na".getBytes()).padToLong()

참고로 jedis에도 hashing 코드가 존재한다.

https://github.com/xetorthio/jedis/blob/master/src/main/java/redis/clients/util/Hashing.java

다음과 같이 사용할 수 있다.

redis.clients.util.Hashing algorithm = redis.clients.util.Hashing.MURMUR_HASH;

long b = algorithm.hash(redis.clients.util.SafeEncoder.encode("google:plus:20161223:na"));

저작자표시 (새창열림)

'general java' 카테고리의 다른 글

nitialize Unable to obtain CGLib fast class and/or method implementation 해결하기 (0)	2017.01.14
ObjectMapper, UnrecognizedPropertyException, JsonInclude 예시 (0)	2016.12.17
[jenkins] 사용할 수 있는 jenkins 환경 변수 (0)	2016.08.20
[jenkins] Waiting for next available executor 해결하기 (0)	2016.08.11
[jenkins] Aborted by anonymous, jenkins Finished: ABORTED 해결하기 (0)	2016.08.10

Posted by '김용환'

[capistrano] 다른 task 호출하기

Ruby 2016. 12. 7. 14:53

capistrano에서 다른 task를 호출하는 방법과 예시이다.

1. invoke 실행하기

2. Rake:Task 실행하기

Rake::Task["namespace:task"].invoke

1. invoke 실행하기

task :status do

invoke 'deploy:status'

end

task :ping do

invoke 'deploy:ping'

end

2. Rake::Task 실행하기

task :change do

Rake::Task["deploy:change_port"].invoke

end

task :change_port do

end

저작자표시 (새창열림)

'Ruby' 카테고리의 다른 글

ruby zookeeper (0)	2017.02.07
[ruby] http call 예시 (0)	2017.01.06
ruby on rails 애플리케이션 실행하기 (0)	2016.10.12
[ruby] File 존재 여부 확인할 때 홈 디렉토리 주의 (0)	2016.08.18
[ruby] or equals 기능 - \|\|= (0)	2016.08.17

Posted by '김용환'

이전 1 다음

'2016/12/07'에 해당되는 글 3건

[cassandra] select count(*) 구하기

'cassandra' 카테고리의 다른 글

guava의 hash function과 redis의 hash function (murmur)

'general java' 카테고리의 다른 글

[capistrano] 다른 task 호출하기

'Ruby' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바