apache spark - Is MEMORY_AND_DISK always better that DISK_ONLY when persisting an RDD to disk? -


using apache spark why choose persist rdd using storage level disk_only rather using memory_and_disk or memory_and_disk_ser ?

is there use-case using disk_only give better performance memory_and_disk or memory_and_disk_ser.

simple example - may have 1 relatively great rdd rdd1 , 1 smalled rdd rdd2. want store both of them.

if apply persist memory_and_disk on both, both of them spilled disk resulting in slower reaed.

but may take different approach - may store rdd1 disk_only. may happen move can store rdd2 right in memory cache() option , able read faster.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

javascript - Get parameter of GET request -

javascript - Twitter Bootstrap - how to add some more margin between tooltip popup and element -