Saturday 15 June 2013

reporting - Schema design for MongoDB pre-aggregated reports -



reporting - Schema design for MongoDB pre-aggregated reports -

i'm next official mongodb docs (http://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/) pre-aggregated reports. according tutorial, pre-aggregated document should this:

{ _id: "20101010/site-1/apache_pb.gif", metadata: { date: isodate("2000-10-10t00:00:00z"), site: "site-1", page: "/apache_pb.gif" }, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": { "0": 3612, "1": 3241, ... "59": 2130 }, "1": { "0": ..., }, ... "23": { "59": 2819 } } }

the thing i'm using approach, , have info stored way. want add together dimension in metadata subdocument , reconsidering whole thing.

my question is: there reason build _id attribute same info stored in metadata attribute? wouldn't plenty create compound index (unique) around metadata , utilize objectid _id key?

thanks!

other way ;)

you can create simple collection:

{ "ts": "unix timestamp", "site": "site-1", "page": "/apache_pb.gif" }

this collection had performance on insert

and using complex aggregate query (with aggregate time grain):

db.test.aggregate( [ { "$project": { "ts": 1, "_id": 0, "grain": { "$subtract": [ { "$divide": [ "$ts", 3600 ] }, { "$mod": [ { "$divide": [ "$ts", 3600 ] }, 1 ] } ] }, "site": 1, "page": 1 } }, { "$group": { "_id": { "site": "$site", "page": "$page", "grain": "$grain", } } }, { "$group": { "tsum": { "$sum": 1 }, "_id": { "grain": "$_id.grain" } } }, { "$project": { "tsum": "$tsum", "_id": 0, "grain": "$_id.grain" } }, { "$sort": { "grain": 1 } } ])

aggregate statistics 1 hr - 3600 sec in example

imho - more simple , manageable solution without complex datamodel preformance (don't forget index)

mongodb reporting data-modeling

No comments:

Post a Comment