mongoDB : How to get most recent fields of an embedded document(non-array) -


maybe i'm going against grain here, i've structured data message thread lives inside document , messages lives inside embedded document.(not subdocument array)

i able sort , limit embedded document timestamp.

for example second document rather large i'd retrieve last 10 messages(or w/e) between bob , myself.

{      "_id" : objectid("2bjbkjb4234j134124"),      "messages" : {         "56a7b13f24236dea1247cdc7" : {             "authorname" : "nick",              "timestamp" : 1.453699391078e12,              "message" : "hello"         },         ... 5 more messages     }  }, {     "_id" : objectid("3e11kjb4234j134172"),      "messages" : {         "5727b13f24236dea1247ced8" : {             "authorname" : "bob",              "timestamp" : 1.2353453455078e12,              "message" : "sup!"         },         ... 50,000 messages     }  } 

question:

is there way equivalent of sort,limit , return on embedded document(like messages above)?

you should using arrays here, using named object keys counterintuitive how database works.

aside basic querying problems, such maybe looking content author "bob" in collection ( simple arrays ), have similar "brute force" matching problems in looking "last 10". not mention, "non-array" becomes subjective "last ten" is.

even taking example supposing these "keys" same generated values of mongodb objectid values ( therefore being monotonic , increasing in value ), working out sort order of these requires brute force javascript processing no assistance collection indexes or natural array index positions:

db.collection.mapreduce(     function() {         var messages = this.messages;         var newmessages = object.keys(this.messages).sort().slice(-10).map(             function(id) {                 return messages[id];             }         );          emit(this._id,{ "messages": newmessages });     },     function() {},   // not reducing here     { "out": { "inline": 1 } } ) 

or simillar juggling "timestamp" values ( not timestamp ), basic premise here turning not array array, in order sort results , limit want return.

basically ugly!, , bad design. using mapreduce sake of method ( via javascript processing ) of altering structure of document returned. logic may performed in client, advantage of stripping unwanted content before sending on network connection.

the idea using arrays imposes overhead on "updating" content "bunk" well. mongodb has supported matched position updating since inception, , structuring correctly usage straightforward:

{      "_id" : objectid("2bjbkjb4234j134124"),      "messages" : [         {             "_id": "56a7b13f24236dea1247cdc7",             "authorname" : "nick",              "timestamp" : 1.453699391078e12,              "message" : "hello"         },         // etc     ] } 

so if wanted match , update specific array entry ( assuming unqiue everywhere, matter of tuning "per-document" if needed ) applies identifier in query portion , positional $ operator in "update" portion of statement:

db.collection.update(     { "messages._id": "56a7b13f24236dea1247cdc7" },     { "$set": {         "messages.$.message": "something new",         "messages.$.timestamp": anewvalue     }} ) 

adding items arrays using $push has advantage of "newest" entries added end of array default. unless change ( , don't modify , want latest timestamp ) need $slice "already array", without further juggling:

db.collection.find(     {},     { "messages": { "$slice": -10 } } ) 

if wanted modified field such "timestamp" affect ordering, store way usign $sort modifier $push. can apply modified array elements simple application of bulk operations:

var bulk = db.collection.initializeorderedbulkop();  // update matched element bulk.find({      "_id": objectid("2bjbkjb4234j134124"),     "messages._id": "56a7b13f24236dea1247cdc7" }).updateone({     "$set": {         "messages.$.message": "something new",         "messages.$.timestamp": anewvalue     } });  // sort array on timestamp bulk.find({      "_id": objectid("2bjbkjb4234j134124"),     "messages._id": "56a7b13f24236dea1247cdc7" }).updateone({     "$push": { "messages": { "$each": [], "$sort": { "timestamp": 1 } } } })  // send , receive server bulk.execute(); 

which while 2 update statements ( since cannot modify same document path 2 operator statements in single update operation ), still work out single request , response server, , therefore pretty efficient.

and of course if did not want store order permanantly, arrays can @ least manipulated in aggregation framework, in manner more efficient processing via javascript of mapreduce:

db.collection.aggregate([     { "$match": objectid("2bjbkjb4234j134124") },     { "$unwind": "$messages" },     { "$sort": { "messages.timestamp": -1 } },    // in reverse order $limit     { "$limit": 10 },     { "$group": {         "_id": "$_id",         "messages": { "$push": "$messages" }     }} ]) 

or super fancy on multiple documents new mongodb 3.2 operators:

db.collection.aggregate([     { "$unwind": "$messages" },     { "$sort": { "_id": 1, "messages.timestamp": 1 } },     { "$group": {         "_id": "$_id",         "messages": { "$push": "$messages" }     }},     { "$project": {         "messages": { "$slice": [ "$messages", -10 ] }     }} ]) 

but performant consideration in cases data should:

  1. be "array" , not nested under named keys of object

  2. ideally stored in order of common use case access on reading.

the final thing @ here if "really" intend store 50,000 messages in array or single document ( because no-one ever exaggerates wildly when asking questions on stackoverflow ) these better off existing in own collection, if bson document limit not exceeded ( event exceeded ), performance considerations indeed terrible.

considering usage patterns of data should prime objective here. because "can" store referenced documents within another, unless have use case "all of them" ( definately never 50,000 ) needed in 1 request, should not doing so.


Comments

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -