Sunday, 15 January 2012

python - PyMongo’s bulk write operation features with generators -



python - PyMongo’s bulk write operation features with generators -

i utilize pymongo’s bulk write operation features executes write operations in batches in order reduces number of network round trips , increaseses rite throughput.

i found here possible used 5000 batch number.

however, not want best size batch number , how combine pymongo’s mass write operation features generators in next code?

from pymongo import mongoclient itertools import groupby import csv def iter_something(rows): key_names = ['type', 'name', 'sub_name', 'pos', 's_type', 'x_type'] chr_key_names = ['letter', 'no'] keys, grouping in groupby(rows, lambda row: row[:6]): result = dict(zip(key_names, keys)) result['chr'] = [dict(zip(chr_key_names, row[6:])) row in group] yield result def main(): converters = [str, str, str, int, int, int, str, int] open("/home/mic/tmp/test.txt") c: reader = csv.reader(c, skipinitialspace=true) converted = ([conv(col) conv, col in zip(converters, row)] row in reader) object_ in iter_something(converted): print(object_) if __name__ == '__main__': db = mongoclient().test sdb = db.snps main()

test.txt file:

test, a, b01, 828288, 1, 7, c, 5 test, a, b01, 828288, 1, 7, t, 6 test, a, b01, 171878, 3, 7, c, 5 test, a, b01, 171878, 3, 7, t, 6 test, a, b01, 871963, 3, 9, a, 5 test, a, b01, 871963, 3, 9, g, 6 test, a, b01, 1932523, 1, 10, t, 4 test, a, b01, 1932523, 1, 10, a, 5 test, a, b01, 1932523, 1, 10, x, 6 test, a, b01, 667214, 1, 14, t, 4 test, a, b01, 667214, 1, 14, g, 5 test, a, b01, 67214, 1, 14, g, 6

you can do:

sdb.insert(iter_something(converted))

pymongo right thing: iterate generator until has yielded 1000 documents or 16mb of data, pause generator while inserts batch mongodb. 1 time batch inserted pymongo resumes generator create next batch, , continues until documents inserted. insert() returns list of inserted document ids.

initial back upwards generators added pymongo in this commit , we've maintained back upwards document generators ever since.

python mongodb python-2.7 pymongo bulkinsert

No comments:

Post a Comment