mongodb - How to make MapReduce work with HDFS -
this might sound stupid question. might write mr code can take input , output hdfs locations , don't need worry parallel computing powerfulness of hadoop/mr. (please right me if wrong here).
however if input not hdfs location taking mongodb info input - mongodb://localhost:27017/mongo_hadoop.messages , running mappers , reducers , storing info mongodb, how hdfs come picture. mean how can sure 1 gb or sized big file first beingness distributed on hdfs , parallel computing beingness done on it? direct uri not distribute info , need take bson file instead, load on hdfs , give hdfs path input mr or framework smart plenty itself?
i sorry if above question stupid or not making sense @ all. new big info much excited dive domain.
thanks.
you describing dbinputformat
. input format reads split external database. hdfs gets involved in setting job, not in actual input. there dboutputformat
. input dbinputformat
splits logical, eg. key ranges.
read database access apache hadoop detailed explanation.
mongodb hadoop mapreduce
No comments:
Post a Comment