apache - Spark Streaming - reduceByKey for a Map inside DStream -
how leverage reducebykey in spark / spark streaming normal scala map resides within dstream?
i have dstream[(string, array[(string, list)])]
want apply reducebykey
function within array[(string, list)]
(joining lists together)
i able in normal spark converting outside rdd normal array (to avoid serialization error on sparkcontext object),
then run foreach , apply sc.parallelize()
within array[(string, list)]
but since dstream doesn't have direct conversion normal array i'm not able apply sc.parallelize()
within component , hence no reducebykey
function.
i'm new spark , spark streaming (the whole map-reduce concept actually) , might not right way if advise improve practice please so.
this old question figured out but.... in order able perform reducebykey... operations on dstream must first import streamingcontext:
import org.apache.spark.streaming.streamingcontext._
this provides implicit methods extending dstream. 1 time you've done not can perform stock reducebykey can utilize time sliced functions like:
reducebykeyandwindow((a: list, b: list) => (a ::: b), seconds(30), seconds(30))
these can quite useful if want aggregation within sliding window. hope helps!
apache streaming apache-spark
No comments:
Post a Comment