'2018/10/29 글 목록

'2018/10/29'에 해당되는 글 2건

2018.10.29 [spark] StructType + Row value 를 함께 저장하는 예제
2018.10.29 Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.im plicits._ Support for serializing other types will be added in future releases.

[spark] StructType + Row value 를 함께 저장하는 예제

scala 2018. 10. 29. 19:39

Spark에서 원래 json 코드와 파싱된(분류된) 데이터를 한번에 보고 싶다면 아래와 같은 코드를 참조하길 바란다.

val schema = StructType(
  List(
    StructField("year", StringType, nullable = true),
    StructField("month", StringType, nullable = true),
    StructField("day", StringType, nullable = true)
  )
)

val ds = spark.readStream.format("kafka")
  .option("kafka.bootstrap.servers", 
                              config.getString(s"kafka.$phase.brokers"))
  .option("startingOffsets", "latest") 
  .option("key.deserializer", "classOf[StringDeserializer]")
  .option("value.deserializer", "classOf[StringDeserializer]")
  .option("subscribe", config.getString(s"kafka.$phase.topic.name"))
  .load()
  .selectExpr("CAST(value AS STRING)")
  .select(from_json($"value", schema).as("data"), 
                                col("value").cast("string"))
  .select("data.*", "value")
  .as[(String, String, String, String)]

저작자표시 비영리 동일조건

'scala' 카테고리의 다른 글

[spark, kafka] object Subscribe in package kafka010 cannot be accessed in package org.apache.spark.streaming.kafka010 , symbol apply is inaccessible from this place 에러 해결하기 (0)	2018.11.02
[spark] spark structured streaming + cassandra 연동 (0)	2018.10.30
Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.im plicits._ Support for serializing other types will be added in future releases. (0)	2018.10.29
[spark] - spark streaming의 누산기 예시 (0)	2018.10.25
[spark] 기본 파티션 개수 (0)	2018.10.12

Posted by '김용환'

Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.im plicits._ Support for serializing other types will be added in future releases.

scala 2018. 10. 29. 19:35

spark streaming을 처리할 때 Encoder를 잘 이해하지 못하면, 아래 에러를 많이 만나게 된다.

Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.im plicits._ Support for serializing other types will be added in future releases.

단순히 Serializable 이슈라 하기에는 좀..

spark을 더 공부할 수 있는 꺼리가 할 수 있다.

DataFrame 및 DataSet에 대한 이해도를 높일 수 있다.

https://stackoverflow.com/questions/39433419/encoder-error-while-trying-to-map-dataframe-row-to-updated-row

https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Encoder.html

https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html

저작자표시 비영리 동일조건

'scala' 카테고리의 다른 글

[spark] spark structured streaming + cassandra 연동 (0)	2018.10.30
[spark] StructType + Row value 를 함께 저장하는 예제 (0)	2018.10.29
[spark] - spark streaming의 누산기 예시 (0)	2018.10.25
[spark] 기본 파티션 개수 (0)	2018.10.12
[spark] "랜덤 포레스트를 이용한 MNIST 데이터셋 분류" 예 (0)	2018.06.01

Posted by '김용환'

이전 1 다음

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

'2018/10/29'에 해당되는 글 2건

[spark] StructType + Row value 를 함께 저장하는 예제

'scala' 카테고리의 다른 글

Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.im plicits._ Support for serializing other types will be added in future releases.

'scala' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바