java - Spark get column names of nested json -


i'm trying column names nested json via dataframes. schema given below:

root  |-- body: struct (nullable = true)  |    |-- sw1: string (nullable = true)  |    |-- sw2: string (nullable = true)  |    |-- sw3: string (nullable = true)  |    |-- sw420: string (nullable = true)  |-- headers: struct (nullable = true)  |    |-- enddate: string (nullable = true)  |    |-- file: string (nullable = true)  |    |-- startdate: string (nullable = true) 

i can column names "body" , "header" df.columns() when try column names body (ex: sw1, sw2,...) df.select("body").columns give me body column.

any suggestion? :)

if question how find nested column names, can inspecting schema of dataframe. schema represented structtype can fields of other datatype objects (included other nested structs). if want discover fields you'll have walk tree recursively. example:

import org.apache.spark.sql.types._ def findfields(path: string, dt: datatype): unit = dt match {   case s: structtype =>      s.fields.foreach(f => findfields(path + "." + f.name, f.datatype))   case other =>      println(s"$path: $other") } 

this walks tree , prints out leaf fields , type:

val df = sqlcontext.read.json(sc.parallelize("""{"a": {"b": 1}}""" :: nil)) findfields("", df.schema)  prints: .a.b: longtype 

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -