RDD를 공부하다가 종종 다음 에러가 발생한다.


java.util.NoSuchElementException: None.get 



웹에서 확인해보니. 이미 Spark Jira에 올라와 있다..


https://issues.apache.org/jira/browse/SPARK-16599





17/03/14 20:33:06 ERROR Executor: Exception in task 1.0 in stage 4.0 (TID 33)

java.util.NoSuchElementException: None.get

at scala.None$.get(Option.scala:347)

at scala.None$.get(Option.scala:345)

at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)

at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:670)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:289)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 4.0 failed 1 times, most recent failure: Lost task 7.0 in stage 4.0 (TID 39, localhost, executor driver): java.util.NoSuchElementException: None.get

at scala.None$.get(Option.scala:347)

at scala.None$.get(Option.scala:345)

at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)

at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:670)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:289)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Driver stacktrace:

  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)

  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)

  at scala.Option.foreach(Option.scala:257)

  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)

  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)

  at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:917)

  at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:915)

  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)

  at org.apache.spark.rdd.RDD.foreach(RDD.scala:915)

  ... 52 elided

Caused by: java.util.NoSuchElementException: None.get

  at scala.None$.get(Option.scala:347)

  at scala.None$.get(Option.scala:345)

  at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)

  at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:670)

  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:289)

  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

  at java.lang.Thread.run(Thread.java:745)




2017년 3월 15일에 cloudera의 sean owen이 pull request를 올렸다.. 잘 되면 좋을 듯 싶다.


https://github.com/apache/spark/pull/17290



소스 패치 내용

https://github.com/apache/spark/commit/27234e154db18cbc614053446713636a69046090


https://github.com/apache/spark/commit/5da4bcffa1b39ea8c83fe63a09e68297be371784


조만간에 패치되어 잘 동작할 듯하다..




Posted by '김용환'
,