apache spark - Is MEMORY_AND_DISK always better that DISK_ONLY when persisting an RDD to disk? -
using apache spark why choose persist rdd using storage level disk_only rather using memory_and_disk or memory_and_disk_ser ?
is there use-case using disk_only give better performance memory_and_disk or memory_and_disk_ser.
simple example - may have 1 relatively great rdd rdd1 , 1 smalled rdd rdd2. want store both of them.
if apply persist memory_and_disk on both, both of them spilled disk resulting in slower reaed.
but may take different approach - may store rdd1 disk_only. may happen move can store rdd2 right in memory cache() option , able read faster.
Comments
Post a Comment