'2017/03 글 목록 (2 Page)

[spark] Only one SparkContext may be running in this JVM - StreamingContext

scala 2017. 3. 24. 19:53

스파크 쉘에서 streaming을 테스트하기 위해 코드를 그대로 따라하면 Only one SparkContext may be running in this JVM 이라는 Exception이 발생한다.

scala> import org.apache.spark._

import org.apache.spark._

scala> import org.apache.spark.streaming._

import org.apache.spark.streaming._

scala> import org.apache.spark.streaming.StreamingContext._

import org.apache.spark.streaming.StreamingContext._

scala> val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")

conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@38394e76

scala> val ssc = new StreamingContext(conf, Seconds(1))

org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:

org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)

org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)

Exception이 발생하는 이유는 데몬이 떠 있어서 그런 것이 아나라 스파크 쉘에 spark context(sc)가 있기 때문이다. 아래와 같이 코딩하면 에러가 발생하지 않고 spark streaming 객체를 얻을 수 있다.

import org.apache.spark._

import org.apache.spark.streaming._

import org.apache.spark.streaming.StreamingContext._

val ssc = new StreamingContext(sc, Seconds(1))

저작자표시

'scala' 카테고리의 다른 글

[spark2.0] dataframe의 filter,where,isin,select,contains,col,between,withColumn, 예제 (0)	2017.03.28
[spark] spark-shell에서 특정 라이브러리의 의존성 라이브러리 다운받기(spark cassandra connector 라이브러리 다운받기) (0)	2017.03.25
[spark] rdd의 stats 함수 (0)	2017.03.24
[spark 1.6] hive 접근하기 (0)	2017.03.22
[scala] null var 사용할 때 타입 사용하기 (0)	2017.03.16

Posted by '김용환'

,

[spark] rdd의 stats 함수

scala 2017. 3. 24. 19:24

spark rdd에 간단한 통계 기능(count, mean, stdev, max, min)이 있고 이를 한 번에 묶는 stats 함수가 있다.

scala> val a = sc.parallelize(List("111", "222"))

a: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24

scala> val ints = a.map(string => string.toInt)

ints: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at <console>:26

scala> val stats = ints.stats()

stats: org.apache.spark.util.StatCounter = (count: 2, mean: 166.500000, stdev: 55.500000, max: 222.000000, min: 111.000000)

scala> stats.count

res0: Long = 2

scala> stats.mean

res1: Double = 166.5

scala> stats.stdev

res2: Double = 55.5

scala> stats.max

res3: Double = 222.0

scala> stats.min

res4: Double = 111.0

저작자표시

'scala' 카테고리의 다른 글

[spark] spark-shell에서 특정 라이브러리의 의존성 라이브러리 다운받기(spark cassandra connector 라이브러리 다운받기) (0)	2017.03.25
[spark] Only one SparkContext may be running in this JVM - StreamingContext (0)	2017.03.24
[spark 1.6] hive 접근하기 (0)	2017.03.22
[scala] null var 사용할 때 타입 사용하기 (0)	2017.03.16
[spark] dataframe 예제 (0)	2017.03.15

Posted by '김용환'

,

[cassandra] key cache, row cache, partition의 성능 관련 자료

cassandra 2017. 3. 24. 12:27

cassandra를 운영하면서 key cache와 row cache는 성능에 밀접한 연관성이 있다.

좋은 글은 다음과 같다.

https://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra

http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/dml/dmlAboutReads.html

row key를 잘 설정해 35% 좋아졌다는 내용이

Cassandra Summit 2014: Performance Tuning Cassandra in AWS from DataStax Academy

파티셔닝이 점차적으로 커지는 현상

Large partition in Cassandra de Shogo Hoshii

저작자표시

'cassandra' 카테고리의 다른 글

[cassandra3] telegraf 모니터링 (0)	2017.04.28
[cassandra] 블룸필터 (0)	2017.04.28
[cassandra] 버전 별 cassandra.yaml설정 내용 (0)	2017.03.23
[cassandra] cql (0)	2017.03.23
[cassandra] cqlsh에서 위,아래,좌,우 키 동작하게 하기 (0)	2017.03.22

Posted by '김용환'

,

[cassandra] 버전 별 cassandra.yaml설정 내용

cassandra 2017. 3. 23. 20:28

cassandra의 cassandra.yaml 이 버전마다 초기 값이 달라서 링크를 걸어본다.

1.0

http://docs.datastax.com/en/archived/cassandra/1.0/docs/configuration/node_configuration.html

1.2

http://docs.datastax.com/en/archived/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?hl=cassandra.yaml

2.1

https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?hl=commitlog_sync_period_in_ms

3.0

https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html?hl=commitlog_sync

저작자표시

'cassandra' 카테고리의 다른 글

[cassandra] 블룸필터 (0)	2017.04.28
[cassandra] key cache, row cache, partition의 성능 관련 자료 (0)	2017.03.24
[cassandra] cql (0)	2017.03.23
[cassandra] cqlsh에서 위,아래,좌,우 키 동작하게 하기 (0)	2017.03.22
opscenter는 cassandra 3.0에서 지원하지 않음. (0)	2017.03.21

Posted by '김용환'

,

[influxdb] influxdb clustering은 무료 버전?

etc tools 2017. 3. 23. 17:06

influxdb의 clustering은 무료 버전은 0.10 정도까지만 지원한다.

현재 2017년 3월에는 상용 버전만 지원한다.

https://docs.influxdata.com/influxdb/v1.2/high_availability/clusters/

Open-source InfluxDB does not support clustering. For high availability or horizontal scaling of InfluxDB, please investigate our commercial clustered offering, InfluxEnterprise.

저작자표시

'etc tools' 카테고리의 다른 글

[github] PR(pull request) 하기 (0)	2017.08.17
github page 웹으로 생성하기 (0)	2017.08.17
[git] git 저장소 변경하기 (0)	2017.03.16
[git] git log 범위 (0)	2016.08.24
[git] git hash 얻기 (0)	2016.08.24

Posted by '김용환'

,

[cassandra] cql

cassandra 2017. 3. 23. 00:01

주요 CQL 커맨드는 대략 다음과 같다.

CAPTURE : 커맨드 결과를 캡쳐해 특정 파일에 추가한다.

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/capture_r.html

CAPTURE '~/mydir/myfile.txt'

CONSISTENCY: 현재 일관성 레벨 또는 주어진 레벨에서 표시하거나 일관성 레벨을 설정한다.

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/consistency_r.html

CONSISTENCY

COPY : 카산드라에서 또는 카산드라로 CSV(컴마로 구분된 값) 데이터를 가져오고 내보낸다.

http://docs.datastax.com/en/cql/3.1/cql/cql_reference/copy_r.html

CREATE KEYSPACE test
  WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };

USE test;

CREATE TABLE airplanes (
  name text PRIMARY KEY,
  manufacturer ascii,
  year int,
  mach float
);

INSERT INTO airplanes
  (name, manufacturer, year, mach)
  VALUES ('P38-Lightning', 'Lockheed', 1937, 0.7);

COPY airplanes (name, manufacturer, year, mach) TO 'temp.csv';

DESCRIBE : 연결된 카산드라 클러스터에 대한 정보, 클러스터에 저장된 데이터 객체를 제공한다.

https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshDescribe.html

DESC keyspaces

EXPAND : 쿼리 결과를 세로로 출력한다.

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/expand.html

cqlsh:my_ks> EXPAND ON
             Now printing expanded output

cqlsh:my_ks> SELECT * FROM users;

EXIT : cqlsh을 종료한다.

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/exit_r.html

PAGING : 쿼리 페이징을 활성화 또는 비활성화한다.

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/paging.html

PAGING  ON | OFF

SHOW : 카산드라 버전, 장비, 현재 cqlsh 클라이언트 세션에 대한 추적 정보 보여준다.

http://docs.datastax.com/en/cql/3.1/cql/cql_reference/show_r.html

SHOW VERSION
| HOST

| SESSION tracing_session_id

SOURCE : CQL 문을 포함하는 파일을 실행한다.

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/source_r.html

SOURCE 'file'

TRACING : 추적 요청을 활성화 또는 비활성화한다.

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tracing_r.html

TRACING ON | OFF

저작자표시

'cassandra' 카테고리의 다른 글

[cassandra] key cache, row cache, partition의 성능 관련 자료 (0)	2017.03.24
[cassandra] 버전 별 cassandra.yaml설정 내용 (0)	2017.03.23
[cassandra] cqlsh에서 위,아래,좌,우 키 동작하게 하기 (0)	2017.03.22
opscenter는 cassandra 3.0에서 지원하지 않음. (0)	2017.03.21
[cassandra] cqlsh - ProtocolError, Supported versions 해결 하기 (0)	2017.03.21

Posted by '김용환'

,

[cassandra] cqlsh에서 위,아래,좌,우 키 동작하게 하기

cassandra 2017. 3. 22. 19:17

cqlsh에서 위,아래,좌,우 키를 못 움직일 수 있다.

cqlsh> ^[[D^[[A^[[C^[[B

Invalid syntax at line 1, char 1

이럴 때는 다음을 설치한다.

sudo -E yum install -y ncurses-devel

sudo -E pip install readline

저작자표시

'cassandra' 카테고리의 다른 글

[cassandra] 버전 별 cassandra.yaml설정 내용 (0)	2017.03.23
[cassandra] cql (0)	2017.03.23
opscenter는 cassandra 3.0에서 지원하지 않음. (0)	2017.03.21
[cassandra] cqlsh - ProtocolError, Supported versions 해결 하기 (0)	2017.03.21
[cassandra3] Failed to add contact point 해결하기 (0)	2017.03.20

Posted by '김용환'

,

[spark 1.6] hive 접근하기

scala 2017. 3. 22. 14:23

spark 1.6에서는 Hive 테이블에 접근해서 데이터를 가져올 수 있다.

import org.apache.spark.sql.hive.HiveContext

val hiveContext = new HiveContext(sc)
val rows = hiveContext.sql("select * from ... limit 1")
val firstRow = rows.first()
println(firstRow)

hive 테이블에 대용량 데이터가 많으면 결과를 볼 수 없고 timeout이 발생한다.

Caused by: java.net.SocketTimeoutException: Read timed out

저작자표시

'scala' 카테고리의 다른 글

[spark] Only one SparkContext may be running in this JVM - StreamingContext (0)	2017.03.24
[spark] rdd의 stats 함수 (0)	2017.03.24
[scala] null var 사용할 때 타입 사용하기 (0)	2017.03.16
[spark] dataframe 예제 (0)	2017.03.15
[spark] dataframe의 partitionby 사용시 hadoop 디렉토리 구조 (0)	2017.03.15

Posted by '김용환'

,

opscenter는 cassandra 3.0에서 지원하지 않음.

cassandra 2017. 3. 21. 18:29

2017년 3월 현재

cassandra 3.x 이상의 버전에 대해서는 opscenter는 무료로 제공되지 않는다.

오직 상용 버전만 지원한다.

opscenter는 cassandra 3.0 이상을 지원하지 않는다.

http://docs.datastax.com/en/landing_page/doc/landing_page/compatibility.html#compatibilityDocument__opsc-compatibility

Enterpise만 지원한다....

opscenter에 cassandra 3.0에 접근하려면 다음에러가 발생한다.

2017-01-01 01:23:11+0900 [] WARN: [control connection] Error connecting to 11.11.11.11: <ErrorMessage code=000a [Protocol error] message="Invalid or unsupported protocol version (2); the lowest supported version is 3 and the greatest is 4">

2017-01-01 01:23:11+0900 [] ERROR: Control connection failed to connect, shutting down Cluster: ('Unable to connect to any servers', {u'172.17.56.90': <ErrorMessage code=000a [Protocol error] message="Invalid or unsupported protocol version (2); the lowest supported version is 3 and the greatest is 4">})

2017-01-01 01:23:11+0900 [] WARN: ProcessingError while calling CreateClusterConfController: Unable to connect to cluster. Error is: Unable to connect to any servers

https://medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063#.z8hdmauma

grafana graphite만 답인듯...

저작자표시

'cassandra' 카테고리의 다른 글

[cassandra] cql (0)	2017.03.23
[cassandra] cqlsh에서 위,아래,좌,우 키 동작하게 하기 (0)	2017.03.22
[cassandra] cqlsh - ProtocolError, Supported versions 해결 하기 (0)	2017.03.21
[cassandra3] Failed to add contact point 해결하기 (0)	2017.03.20
[cassandra] cassandra 3.0.9 설치 (0)	2017.03.17

Posted by '김용환'

,

[cassandra] cqlsh - ProtocolError, Supported versions 해결 하기

cassandra 2017. 3. 21. 15:40

python 2.7.11이상에서 cql을 실행할 수 없었던 버그가 있었다. 이는 2.1.16, 2.2.8, 3.0.9, 3.8 이상에서 패치되었다.

(거의 최근에 발견된 내용이다..)

https://issues.apache.org/jira/browse/CASSANDRA-11850

안전하게 2.7.9를 사용한다.

python 2.7.9 설치

http://knight76.tistory.com/entry/python-26-%EC%97%90%EC%84%9C-python-279-%EC%97%85%EA%B7%B8%EB%A0%88%EC%9D%B4%EB%93%9C-%ED%95%98%EA%B8%B0

pip 설치

http://knight76.tistory.com/entry/python-python-279%EC%97%90-%ED%95%B4%EB%8B%B9%EB%90%98%EB%8A%94-pip-%EC%84%A4%EC%B9%98%ED%95%98%EA%B8%B0

cqlsh를 설치한다.

$ sudo -E pip install cqlsh

버전을 확인한다.

$ cqlsh --version

cqlsh 5.0.1

$ nodetool version

ReleaseVersion: 3.0.9

cqlsh를 실행하면 아래와 같은 에러가 발생할 수 있다.

$ cqlsh

Connection error: ('Unable to connect to any servers', {'::1': error(113, "Tried connecting to [('::1', 9042, 0, 0)]. Last error: No route to host"), '127.0.0.1': ProtocolError("cql_version '3.3.1' is not supported by remote (w/ native protocol). Supported versions: [u'3.4.0']",)})

$ cqlsh

Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

파이썬 드라이버 소스가 안맞거나 ip가 없어서이니.. 단순히 아래와 같이 처리한다.

참고로 cassandra 3.0.9는 cql 버전을 3.4.0을 사용한다.

~/.bashrc 파일에 다음과 같이 설정한다.

alias cqlsh="/usr/local/bin/cqlsh --cqlversion=3.4.0 11.11.11.17"

다시 cqlsh를 실행하면 잘 동작한다.

$ cqlsh

Connected to StoryCluster at 172.17.56.91:9042.

[cqlsh 5.0.1 | Cassandra 3.0.9 | CQL spec 3.4.0 | Native protocol v4]

Use HELP for help.

cqlsh>

저작자표시

'cassandra' 카테고리의 다른 글

[cassandra] cqlsh에서 위,아래,좌,우 키 동작하게 하기 (0)	2017.03.22
opscenter는 cassandra 3.0에서 지원하지 않음. (0)	2017.03.21
[cassandra3] Failed to add contact point 해결하기 (0)	2017.03.20
[cassandra] cassandra 3.0.9 설치 (0)	2017.03.17
[cassandra] 설정 수정 주의 (0)	2017.03.17

Posted by '김용환'

,

'2017/03'에 해당되는 글 63건

[spark] Only one SparkContext may be running in this JVM - StreamingContext

'scala' 카테고리의 다른 글

[spark] rdd의 stats 함수

'scala' 카테고리의 다른 글

[cassandra] key cache, row cache, partition의 성능 관련 자료

'cassandra' 카테고리의 다른 글

[cassandra] 버전 별 cassandra.yaml설정 내용

'cassandra' 카테고리의 다른 글

[influxdb] influxdb clustering은 무료 버전?

'etc tools' 카테고리의 다른 글

[cassandra] cql

'cassandra' 카테고리의 다른 글

[cassandra] cqlsh에서 위,아래,좌,우 키 동작하게 하기

'cassandra' 카테고리의 다른 글

[spark 1.6] hive 접근하기

'scala' 카테고리의 다른 글

opscenter는 cassandra 3.0에서 지원하지 않음.

'cassandra' 카테고리의 다른 글

[cassandra] cqlsh - ProtocolError, Supported versions 해결 하기

'cassandra' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바