Practicing with Spark's join and scala on Array[String] -
i new both spark , scala, , i'm trying practice join command in spark.
i have 2 csv files:
ads.csv
5de3ae82-d56a-4f70-8738-7e787172c018,adprovider1 f1b6c6f4-8221-443d-812e-de857b77b2f4,adprovider2 aca88cd0-fe50-40eb-8bda-81965b377827,adprovider1 940c138a-88d3-4248-911a-7dbe6a074d9f,adprovider3 983bb5e5-6d5b-4489-85b3-00e1d62f6a3a,adprovider3 00832901-21a6-4888-b06b-1f43b9d1acac,adprovider1 9a1786e1-ab21-43e3-b4b2-4193f572acbc,adprovider1 50a78218-d65a-4574-90de-0c46affbe7f3,adprovider5 d9bb837f-c85d-45d4-95f2-97164c62aa42,adprovider4 611cf585-a8cf-43e9-9914-c9d1dc30dab5,adprovider1 impression.csv is:
5de3ae82-d56a-4f70-8738-7e787172c018,publisher1 f1b6c6f4-8221-443d-812e-de857b77b2f4,publisher2 aca88cd0-fe50-40eb-8bda-81965b377827,publisher1 940c138a-88d3-4248-911a-7dbe6a074d9f,publisher3 983bb5e5-6d5b-4489-85b3-00e1d62f6a3a,publisher3 00832901-21a6-4888-b06b-1f43b9d1acac,publisher1 9a1786e1-ab21-43e3-b4b2-4193f572acbc,publisher1 611cf585-a8cf-43e9-9914-c9d1dc30dab5,publisher1 i want join them first id key , 2 values.
so read them in this:
val ads = sc.textfile("ads.csv") ads: org.apache.spark.rdd.rdd[string] = mappartitionsrdd[1] @ textfile @ <console>:21 val impressions = sc.textfile("impressions.csv") impressions: org.apache.spark.rdd.rdd[string] = mappartitionsrdd[3] @ textfile @ <console>:21 ok, have make key,value pairs: val adpairs = ads.map(line => line.split(",")) val impressionpairs = impressions.map(line => line.split(","))
res11: org.apache.spark.rdd.rdd[array[string]] = mappartitionsrdd[6] @ map @ <console>:23 res13: org.apache.spark.rdd.rdd[array[string]] = mappartitionsrdd[7] @ map @ <console>:23 but can't join them:
val result = impressionpairs.join(adpairs) <console>:29: error: value join not member of org.apache.spark.rdd.rdd[array[string]] val result = impressionpairs.join(adpairs) do need convert pairs format?
you there, need transform array[string] key-value pairs, this:
val adpairs = ads.map(line => { val substrings = line.split(",") (substrings(0), substrings(1)) }) (and same impressionpairs)
that give rdds of type rdd[(string, string)] can joined :)
Comments
Post a Comment