scala - How to get ID of a map task in Spark? -
is there way id of map task in spark? example if each map task calls user defined function, can id of map task whithin user defined function?
i not sure mean id of map task can access task information using taskcontext:
import org.apache.spark.taskcontext sc.parallelize(seq[int](), 4).mappartitions(_ => { val ctx = taskcontext.get val stageid = ctx.stageid val partid = ctx.partitionid val hostname = java.net.inetaddress.getlocalhost().gethostname() iterator(s"stage: $stageid, partition: $partid, host: $hostname") }).collect.foreach(println) a similar functionality has been added pyspark in spark 2.2.0 (spark-18576):
from pyspark import taskcontext import socket def task_info(*_): ctx = taskcontext() return ["stage: {0}, partition: {1}, host: {2}".format( ctx.stageid(), ctx.partitionid(), socket.gethostname())] x in sc.parallelize([], 4).mappartitions(task_info).collect(): print(x)
Comments
Post a Comment