Thursday 15 March 2012

mongodb - How to make MapReduce work with HDFS -



mongodb - How to make MapReduce work with HDFS -

this might sound stupid question. might write mr code can take input , output hdfs locations , don't need worry parallel computing powerfulness of hadoop/mr. (please right me if wrong here).

however if input not hdfs location taking mongodb info input - mongodb://localhost:27017/mongo_hadoop.messages , running mappers , reducers , storing info mongodb, how hdfs come picture. mean how can sure 1 gb or sized big file first beingness distributed on hdfs , parallel computing beingness done on it? direct uri not distribute info , need take bson file instead, load on hdfs , give hdfs path input mr or framework smart plenty itself?

i sorry if above question stupid or not making sense @ all. new big info much excited dive domain.

thanks.

you describing dbinputformat. input format reads split external database. hdfs gets involved in setting job, not in actual input. there dboutputformat. input dbinputformat splits logical, eg. key ranges.

read database access apache hadoop detailed explanation.

mongodb hadoop mapreduce

No comments:

Post a Comment