apache spark - Is MEMORY_AND_DISK always better that DISK_ONLY when persisting an RDD to disk? -


using apache spark why choose persist rdd using storage level disk_only rather using memory_and_disk or memory_and_disk_ser ?

is there use-case using disk_only give better performance memory_and_disk or memory_and_disk_ser.

simple example - may have 1 relatively great rdd rdd1 , 1 smalled rdd rdd2. want store both of them.

if apply persist memory_and_disk on both, both of them spilled disk resulting in slower reaed.

but may take different approach - may store rdd1 disk_only. may happen move can store rdd2 right in memory cache() option , able read faster.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -