java - Spark get column names of nested json -


i'm trying column names nested json via dataframes. schema given below:

root  |-- body: struct (nullable = true)  |    |-- sw1: string (nullable = true)  |    |-- sw2: string (nullable = true)  |    |-- sw3: string (nullable = true)  |    |-- sw420: string (nullable = true)  |-- headers: struct (nullable = true)  |    |-- enddate: string (nullable = true)  |    |-- file: string (nullable = true)  |    |-- startdate: string (nullable = true) 

i can column names "body" , "header" df.columns() when try column names body (ex: sw1, sw2,...) df.select("body").columns give me body column.

any suggestion? :)

if question how find nested column names, can inspecting schema of dataframe. schema represented structtype can fields of other datatype objects (included other nested structs). if want discover fields you'll have walk tree recursively. example:

import org.apache.spark.sql.types._ def findfields(path: string, dt: datatype): unit = dt match {   case s: structtype =>      s.fields.foreach(f => findfields(path + "." + f.name, f.datatype))   case other =>      println(s"$path: $other") } 

this walks tree , prints out leaf fields , type:

val df = sqlcontext.read.json(sc.parallelize("""{"a": {"b": 1}}""" :: nil)) findfields("", df.schema)  prints: .a.b: longtype 

Comments

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -