[spark2] spark SQL 예제

scala 2017. 5. 20. 06:33

Spark sql 예제이다.

scala> val dataset = Seq("samuel", "jackson", "kin").toDF("name_string")

dataset: org.apache.spark.sql.DataFrame = [name_string: string]

scala> dataset.registerTempTable("names")

warning: there was one deprecation warning; re-run with -deprecation for details

scala> sql("""select name_string from names""").show

+-----------+

|name_string|

+-----------+

| samuel|

| jackson|

| kin|

+-----------+

scala> sql("""select name_string from names where name_string ='kin' """).show

+-----------+

|name_string|

+-----------+

| kin|

+-----------+

like 검색은 조금 신경써서 해야 한다.

일반적인 like 검색일 때는 결과가 나타나지 않는다.

scala> sql("""select name_string from names where name_string like '*' """).show

+-----------+

|name_string|

+-----------+

like concat을 사용하면 like 검색을 할 수 있다.

scala> sql("""select name_string from names where name_string like concat('%','sam','%') """).show

+-----------+

|name_string|

+-----------+

| samuel|

+-----------+

오래전 부터 SQL문이 아닌 ETL 파이프라인 방식으로 사용할 수도 있었다.

(ETL 파이프은 조금 쓰기 불편하다.. )

scala> dataset.groupBy("name_string").count().filter($"count" >= 1).show()

+-----------+-----+

|name_string|count|

+-----------+-----+

| jackson| 1|

| kin| 1|

| samuel| 1|

+-----------+-----+

scala> dataset.groupBy("name_string").count().filter($"count" >= 2).show()

+-----------+-----+

|name_string|count|

+-----------+-----+

scala> dataset.select("name_string").show()

+-----------+

|name_string|

+-----------+

| samuel|

| jackson|

| kin|

+-----------+

scala> dataset.select("name_string").where($"name_string".equalTo("samuel")).show()

+-----------+

|name_string|

+-----------+

| samuel|

+-----------+

scala> dataset.select("name_string").where($"name_string".contains("sam")).show()

+-----------+

|name_string|

+-----------+

| samuel|

+-----------+

scala> dataset.select("name_string").groupBy($"name_string").count().show()

+-----------+-----+

|name_string|count|

+-----------+-----+

| jackson| 1|

| kin| 1|

| samuel| 1|

+-----------+-----+

그리고 Spark SQL에는 다양한 UDF 함수를 지원한다. 다음은 관련 예제이다.

scala> val dataset2 = Seq(("samuel", "01/05/2017"), ("noah", "01/05/2018"))

dataset2: Seq[(String, String)] = List((samuel,01/05/2017), (noah,01/05/2018))

scala> val dataset2 = Seq(("samuel", "01/05/2017"), ("noah", "01/05/2018")).toDF("name", "create_date")

dataset2: org.apache.spark.sql.DataFrame = [name: string, create_date: string]

scala> dataset2.registerTempTable("reservation")

warning: there was one deprecation warning; re-run with -deprecation for details

scala> sql("""SELECT * from reservation""").show

+------+-----------+

| name|create_date|

+------+-----------+

|samuel| 01/05/2017|

| noah| 01/05/2018|

+------+-----------+

몇 요일인지 확인하려면 다음과 같다.

scala> sql("""SELECT name,create_date,from_unixtime(unix_timestamp(create_date, 'MM/dd/yyyy'), 'EEEE') as day from reservation where name='samuel' """).show

+------+-----------+--------+

| name|create_date| day|

+------+-----------+--------+

|samuel| 01/05/2017|Thursday|

+------+-----------+--------+

이번에는 cass class를 이용한 sql 작업이다.

scala> case class Num(x:Int)

defined class Num

scala> val rdd=sc.parallelize(List(Num(1), Num(2), Num(3)))

rdd: org.apache.spark.rdd.RDD[Num] = ParallelCollectionRDD[12] at parallelize at <console>:34

scala> spark.createDataFrame(rdd).show

+---+

| x|

+---+

| 1|

| 2|

| 3|

+---+

scala> val df = spark.createDataFrame(rdd)

df: org.apache.spark.sql.DataFrame = [x: int]

scala> df.registerTempTable("num")

warning: there was one deprecation warning; re-run with -deprecation for details

scala> sql("""select * from num where x=2""").show

+---+

| x|

+---+

| 2|

+---+

저작자표시

'scala' 카테고리의 다른 글

[spark] join 예제 (0)	2017.05.23
[spark] where과 filter의 차이 (0)	2017.05.23
[spark2] spark2 rdd 생성 -makeRDD (0)	2017.04.29
[scala] 라인 피드("\n") 관련 예시 코드 (0)	2017.04.24
[scala] Iterator의 continually함수 (0)	2017.04.24

Posted by '김용환'

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

[spark2] spark SQL 예제

'scala' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바