scala - Spark: Incremental collect() to a partition causes OutOfMemory in Heap -

April 15, 2012

i have following code. need print rdd console, collecting large rdd in smaller chunks collecting per partition. avoid collecting entire rdd @ once. when monitoring heap , gc log, seems nothing ever being gc'd. heap keeps on growing until hits outofmemory error. if understanding correct, in below once println statement executed collected rdds, won't needed safe gc, that's not see in gc log each call collect accumulates until oom. know why collected data not being gc'd?

 val writes = partitions.foreach { partition =>       val rddpartition = rdds.mappartitionswithindex ({          case (index, data) => if (index == partition.index) data else iterator[words]()       }, false).collect().toseq       val partialreport = report(rddpartition, reportid, datecreated)       println(partialreport.name)      }

if dataset huge, master node cant handle , shutdown. may try writing them file (eg saveastextfile), read each file again

Search This Blog

Live one

scala - Spark: Incremental collect() to a partition causes OutOfMemory in Heap -

Comments

Post a Comment

Popular posts from this blog

php - XML feed for Wordpress Social Board plugin modifications -

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

javascript - Twitter Bootstrap - how to add some more margin between tooltip popup and element -