Wednesday, 15 April 2015

How to get element by Index in Spark RDD (Java) -



How to get element by Index in Spark RDD (Java) -

i know method rdd.first() gives me first element in rdd.

also there method rdd.take(num) gives me first "num" elements.

but isn't there possibility element index?

thanks.

this should possible first indexing rdd. transformation 'zipwithindex' provides stable indexing, numbering each element in original order.

given: rdd = (a,b,c)

val withindex = rdd.zipwithindex // ((a,0),(b,1),(c,2))

to lookup element index, form not useful. first need utilize index key:

val indexkey = withindex.map{case (k,v) => (v,k)} //((0,a),(1,b),(2,c))

now, it's possible utilize 'lookup' action in pairrdd find element key:

val b = indexkey.lookup(1) // array(b)

if you're expecting utilize lookup on same rdd, i'd recommend cache indexkey rdd improve performance.

how using java api exercise left reader.

java apache-spark

No comments:

Post a Comment