sql - Does SparkSQL support subquery? -


i running query in spark shell gives me error,

sqlcontext.sql(  "select sal samplecsv sal < (select max(sal) samplecsv)" ).collect().foreach(println) 

error:

java.lang.runtimeexception: [1.47] failure: ``)'' expected identifier max found

select sal samplecsv sal < (select max(sal) samplecsv) ^ @ scala.sys.package$.error(package.scala:27) can explan me,thanks

spark 2.0+

spark sql should support both correlated , uncorrelated subqueries. see subquerysuite details. examples include:

select * l exists (select * r l.a = r.c) select * l not exists (select * r l.a = r.c)  select * l l.a in (select c r) select * l not in (select c r) 

unfortunately (spark 2.0) impossible express same logic using dataframe dsl.

spark < 2.0

spark supports subqueries in from clause (same hive <= 0.12).

select col (select *  t1 bar) t2 

it doesn't support subqueries in where clause.generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't expressed using spark without promoting cartesian join.

since subquery performance significant issue in typical relational system , every subquery can expressed using join there no loss-of-function here.


Comments

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -