java - Spark get column names of nested json -

March 15, 2013

i'm trying column names nested json via dataframes. schema given below:

root  |-- body: struct (nullable = true)  |    |-- sw1: string (nullable = true)  |    |-- sw2: string (nullable = true)  |    |-- sw3: string (nullable = true)  |    |-- sw420: string (nullable = true)  |-- headers: struct (nullable = true)  |    |-- enddate: string (nullable = true)  |    |-- file: string (nullable = true)  |    |-- startdate: string (nullable = true)

i can column names "body" , "header" df.columns() when try column names body (ex: sw1, sw2,...) df.select("body").columns give me body column.

any suggestion? :)

if question how find nested column names, can inspecting schema of dataframe. schema represented structtype can fields of other datatype objects (included other nested structs). if want discover fields you'll have walk tree recursively. example:

import org.apache.spark.sql.types._ def findfields(path: string, dt: datatype): unit = dt match {   case s: structtype =>      s.fields.foreach(f => findfields(path + "." + f.name, f.datatype))   case other =>      println(s"$path: $other") }

this walks tree , prints out leaf fields , type:

val df = sqlcontext.read.json(sc.parallelize("""{"a": {"b": 1}}""" :: nil)) findfields("", df.schema)  prints: .a.b: longtype

Search This Blog

Live one

java - Spark get column names of nested json -

Comments

Post a Comment

Popular posts from this blog

php - XML feed for Wordpress Social Board plugin modifications -

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

javascript - Twitter Bootstrap - how to add some more margin between tooltip popup and element -