what factors affect how many spark job concurrently -
we have set spark job server spark jobs submitted.but found out our 20 nodes(8 cores/128g memory per node) spark cluster can afford 10 spark jobs running concurrently.
can share detailed info factors affect how many spark jobs can run concurrently? how can tune conf can take full advantage of cluster?
question missing context, first - seems spark job server limits number of concurrent jobs (unlike spark itself, puts limit on number of tasks, not jobs):
from application.conf
# number of jobs can run simultaneously per context # if not set, defaults number of cores on machine jobserver running max-jobs-per-context = 8
if that's not issue (you set limit higher, or using more 1 context), total number of cores in cluster (8*20 = 160) maximum number of concurrent tasks. if each of jobs creates 16 tasks, spark queue next incoming job waiting cpus available.
spark creates task per partition of input data, , number of partitions decided according partitioning of input on disk, or calling repartition
or coalesce
on rdd/dataframe manually change partitioning. other actions operate on more 1 rdd (e.g. union
) may change number of partitions.
Comments
Post a Comment