sql - Does SparkSQL support subquery? -

January 15, 2015

i running query in spark shell gives me error,

sqlcontext.sql(  "select sal samplecsv sal < (select max(sal) samplecsv)" ).collect().foreach(println)

error:

java.lang.runtimeexception: [1.47] failure: ``)'' expected identifier max found

select sal samplecsv sal < (select max(sal) samplecsv) ^ @ scala.sys.package$.error(package.scala:27) can explan me,thanks

spark 2.0+

spark sql should support both correlated , uncorrelated subqueries. see subquerysuite details. examples include:

select * l exists (select * r l.a = r.c) select * l not exists (select * r l.a = r.c)  select * l l.a in (select c r) select * l not in (select c r)

unfortunately (spark 2.0) impossible express same logic using dataframe dsl.

spark < 2.0

spark supports subqueries in from clause (same hive <= 0.12).

select col (select *  t1 bar) t2

it doesn't support subqueries in where clause.generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't expressed using spark without promoting cartesian join.

since subquery performance significant issue in typical relational system , every subquery can expressed using join there no loss-of-function here.

Search This Blog

Live one

sql - Does SparkSQL support subquery? -

Comments

Post a Comment

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

c - getting error: cannot take the address of an rvalue of type 'int' -

How to merge four videos on one screen with ffmpeg -