Monday, 15 April 2013

apache - Spark Streaming - reduceByKey for a Map inside DStream -



apache - Spark Streaming - reduceByKey for a Map inside DStream -

how leverage reducebykey in spark / spark streaming normal scala map resides within dstream?

i have dstream[(string, array[(string, list)])] want apply reducebykey function within array[(string, list)] (joining lists together)

i able in normal spark converting outside rdd normal array (to avoid serialization error on sparkcontext object),

then run foreach , apply sc.parallelize() within array[(string, list)]

but since dstream doesn't have direct conversion normal array i'm not able apply sc.parallelize() within component , hence no reducebykey function.

i'm new spark , spark streaming (the whole map-reduce concept actually) , might not right way if advise improve practice please so.

this old question figured out but.... in order able perform reducebykey... operations on dstream must first import streamingcontext:

import org.apache.spark.streaming.streamingcontext._

this provides implicit methods extending dstream. 1 time you've done not can perform stock reducebykey can utilize time sliced functions like:

reducebykeyandwindow((a: list, b: list) => (a ::: b), seconds(30), seconds(30))

these can quite useful if want aggregation within sliding window. hope helps!

apache streaming apache-spark

No comments:

Post a Comment