scala - How to get ID of a map task in Spark? -


is there way id of map task in spark? example if each map task calls user defined function, can id of map task whithin user defined function?

i not sure mean id of map task can access task information using taskcontext:

import org.apache.spark.taskcontext  sc.parallelize(seq[int](), 4).mappartitions(_ => {     val ctx = taskcontext.get     val stageid = ctx.stageid     val partid = ctx.partitionid     val hostname = java.net.inetaddress.getlocalhost().gethostname()     iterator(s"stage: $stageid, partition: $partid, host: $hostname") }).collect.foreach(println) 

a similar functionality has been added pyspark in spark 2.2.0 (spark-18576):

from pyspark import taskcontext import socket  def task_info(*_):     ctx = taskcontext()     return ["stage: {0}, partition: {1}, host: {2}".format(         ctx.stageid(), ctx.partitionid(), socket.gethostname())]  x in sc.parallelize([], 4).mappartitions(task_info).collect():     print(x) 

Comments

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -