Thursday 15 September 2011

mongodb - Retrieving Mongo docs for a large fixed set of identifiers -



mongodb - Retrieving Mongo docs for a large fixed set of identifiers -

i have mongo db 200m+ documents. each document has "name" field (indexed) string , "items" field (not indexed) array of integers. size of array can range 1 100.

say have txt file 1m names. need create txt file containing "items" each of 1m names.

options:

just iterate through names 1 @ time , extract items based on _id. create "batches" of little sets of names (say 100 @ time) , query db using $in operator. later iterate through documents 1 one. use sort of map-reduce break 1m names , query them in parallel.

what efficient way this?

this hard reply without trying , profiling.

since array little , assuming every name found brute-force scan of database in natural order may faster of options suggested.

using parallel scan (http://docs.mongodb.org/manual/reference/command/parallelcollectionscan/) can iterate on documents; can hold 1m names in memory , 1 time every 200 records you'll find match write output text file.

mongodb mongodb-query

No comments:

Post a Comment