apache spark - Is MEMORY_AND_DISK always better that DISK_ONLY when persisting an RDD to disk? -


using apache spark why choose persist rdd using storage level disk_only rather using memory_and_disk or memory_and_disk_ser ?

is there use-case using disk_only give better performance memory_and_disk or memory_and_disk_ser.

simple example - may have 1 relatively great rdd rdd1 , 1 smalled rdd rdd2. want store both of them.

if apply persist memory_and_disk on both, both of them spilled disk resulting in slower reaed.

but may take different approach - may store rdd1 disk_only. may happen move can store rdd2 right in memory cache() option , able read faster.


Comments

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -