sql - Does SparkSQL support subquery? -
i running query in spark shell gives me error,
sqlcontext.sql( "select sal samplecsv sal < (select max(sal) samplecsv)" ).collect().foreach(println) error:
java.lang.runtimeexception: [1.47] failure: ``)'' expected identifier max found
select sal samplecsv sal < (select max(sal) samplecsv) ^ @ scala.sys.package$.error(package.scala:27) can explan me,thanks
spark 2.0+
spark sql should support both correlated , uncorrelated subqueries. see subquerysuite details. examples include:
select * l exists (select * r l.a = r.c) select * l not exists (select * r l.a = r.c) select * l l.a in (select c r) select * l not in (select c r) unfortunately (spark 2.0) impossible express same logic using dataframe dsl.
spark < 2.0
spark supports subqueries in from clause (same hive <= 0.12).
select col (select * t1 bar) t2 it doesn't support subqueries in where clause.generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't expressed using spark without promoting cartesian join.
since subquery performance significant issue in typical relational system , every subquery can expressed using join there no loss-of-function here.
Comments
Post a Comment