java - Spark get column names of nested json -
i'm trying column names nested json via dataframes. schema given below:
root |-- body: struct (nullable = true) | |-- sw1: string (nullable = true) | |-- sw2: string (nullable = true) | |-- sw3: string (nullable = true) | |-- sw420: string (nullable = true) |-- headers: struct (nullable = true) | |-- enddate: string (nullable = true) | |-- file: string (nullable = true) | |-- startdate: string (nullable = true)
i can column names "body" , "header" df.columns() when try column names body (ex: sw1, sw2,...) df.select("body").columns give me body column.
any suggestion? :)
if question how find nested column names, can inspecting schema
of dataframe. schema represented structtype
can fields of other datatype
objects (included other nested structs). if want discover fields you'll have walk tree recursively. example:
import org.apache.spark.sql.types._ def findfields(path: string, dt: datatype): unit = dt match { case s: structtype => s.fields.foreach(f => findfields(path + "." + f.name, f.datatype)) case other => println(s"$path: $other") }
this walks tree , prints out leaf fields , type:
val df = sqlcontext.read.json(sc.parallelize("""{"a": {"b": 1}}""" :: nil)) findfields("", df.schema) prints: .a.b: longtype
Comments
Post a Comment