what factors affect how many spark job concurrently -


we have set spark job server spark jobs submitted.but found out our 20 nodes(8 cores/128g memory per node) spark cluster can afford 10 spark jobs running concurrently.

can share detailed info factors affect how many spark jobs can run concurrently? how can tune conf can take full advantage of cluster?

question missing context, first - seems spark job server limits number of concurrent jobs (unlike spark itself, puts limit on number of tasks, not jobs):

from application.conf

 # number of jobs can run simultaneously per context  # if not set, defaults number of cores on machine jobserver running  max-jobs-per-context = 8 

if that's not issue (you set limit higher, or using more 1 context), total number of cores in cluster (8*20 = 160) maximum number of concurrent tasks. if each of jobs creates 16 tasks, spark queue next incoming job waiting cpus available.

spark creates task per partition of input data, , number of partitions decided according partitioning of input on disk, or calling repartition or coalesce on rdd/dataframe manually change partitioning. other actions operate on more 1 rdd (e.g. union) may change number of partitions.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -