Saturday, 15 August 2015

ElasticSearch filtering by field1 THEN field2 THEN take max of field3 -

ElasticSearch filtering by field1 THEN field2 THEN take max of field3 -

i struggling info need elasticsearch.

my log statements this:

field1: illustration field2: example2 field3: example3

i search timeframe (using lastly 24 hours) find info has this in field1 , that in field2.

there may multiple this.that.[field3] entries, want homecoming maximum of field.

in fact, in data, field3 key of entry.

what best way of retrieving info need? have managed results returned using aggs, info in buckets, , interested in info max value of field3.

i have added illustration of query looking do:

{ "size": 0, "aggs": { "agg_129": { "filters": { "filters": { "carname: toyota": { "query": { "query_string": { "query": "carname: toyota" } } } } }, "aggs": { "agg_130": { "filters": { "filters": { "attribute: timeused": { "query": { "query_string": { "query": "attribute: timeused" } } } } }, "aggs": { "agg_131": { "terms": { "field": "@timestamp", "size": 0, "order": { "_count": "desc" } } } } } } } }, "query": { "filtered": { "query": { "match_all": {} }, "filter": { "bool": { "must": [ { "range": { "@timestamp": { "gte": "2014-10-27t00:00:00.000z", "lte": "2014-10-28t23:59:59.999z" } } } ], "must_not": [] } } } } }

so, illustration above showing have carname = toyota , attribute = timeused.

my info follows:

there x number of cars carname , each auto has y number of attributes , each of attributes have document timestamp.

to begin with, looking query carname.attribute.timestamp (latest), however, if able utilize 1 query latest timestamp every attribute every carname, decrease query calls ~50 one.

if using elasticsearch v1.3+, can add together top_hits aggregation parameter size:1 , descending sort on field3 value.

this homecoming whole document maximum value on field, wish.

this example in documentation might trick.


ok, seems don't need whole document, maximum timestamp value. can utilize max aggregation instead of using top_hits one.

the next query (not tested) should give maximum timestamp value each top 10 attribute value of each carname top 10 value, in 1 request.

terms aggregation grouping clause, , should not have query 50 times retrieve values of each carname/attribute combination : point of nesting terms aggregation attribute in carname aggregation.

note that, work properly, carname , attribute fields should not_analyzed. if it's not case, have "funny" results in buckets. problem (and possible solution) described here.

feel free alter size parameter of terms aggregation fit case.

{ "size": 0, "aggs": { "by_carnames": { "terms": { "field": "carname", "size": 10 }, "aggs": { "by_attribute": { "terms": { "field": "attribute", "size": 10 }, "aggs": { "max_timestamp": { "max": { "field": "@timestamp" } } } } } } }, "query": { "filtered": { "filter": { "bool": { "must": [ { "range": { "@timestamp": { "gte": "2014-10-27t00:00:00.000z", "lte": "2014-10-28t23:59:59.999z" } } } ] } } } } }


No comments:

Post a Comment