Saturday, 15 August 2015

ElasticSearch filtering by field1 THEN field2 THEN take max of field3 -



ElasticSearch filtering by field1 THEN field2 THEN take max of field3 -

i struggling info need elasticsearch.

my log statements this:

field1: illustration field2: example2 field3: example3

i search timeframe (using lastly 24 hours) find info has this in field1 , that in field2.

there may multiple this.that.[field3] entries, want homecoming maximum of field.

in fact, in data, field3 key of entry.

what best way of retrieving info need? have managed results returned using aggs, info in buckets, , interested in info max value of field3.

i have added illustration of query looking do: https://jsonblob.com/54535d49e4b0d117eeaf6bb4

{ "size": 0, "aggs": { "agg_129": { "filters": { "filters": { "carname: toyota": { "query": { "query_string": { "query": "carname: toyota" } } } } }, "aggs": { "agg_130": { "filters": { "filters": { "attribute: timeused": { "query": { "query_string": { "query": "attribute: timeused" } } } } }, "aggs": { "agg_131": { "terms": { "field": "@timestamp", "size": 0, "order": { "_count": "desc" } } } } } } } }, "query": { "filtered": { "query": { "match_all": {} }, "filter": { "bool": { "must": [ { "range": { "@timestamp": { "gte": "2014-10-27t00:00:00.000z", "lte": "2014-10-28t23:59:59.999z" } } } ], "must_not": [] } } } } }

so, illustration above showing have carname = toyota , attribute = timeused.

my info follows:

there x number of cars carname , each auto has y number of attributes , each of attributes have document timestamp.

to begin with, looking query carname.attribute.timestamp (latest), however, if able utilize 1 query latest timestamp every attribute every carname, decrease query calls ~50 one.

if using elasticsearch v1.3+, can add together top_hits aggregation parameter size:1 , descending sort on field3 value.

this homecoming whole document maximum value on field, wish.

this example in documentation might trick.

edit:

ok, seems don't need whole document, maximum timestamp value. can utilize max aggregation instead of using top_hits one.

the next query (not tested) should give maximum timestamp value each top 10 attribute value of each carname top 10 value, in 1 request.

terms aggregation grouping clause, , should not have query 50 times retrieve values of each carname/attribute combination : point of nesting terms aggregation attribute in carname aggregation.

note that, work properly, carname , attribute fields should not_analyzed. if it's not case, have "funny" results in buckets. problem (and possible solution) described here.

feel free alter size parameter of terms aggregation fit case.

{ "size": 0, "aggs": { "by_carnames": { "terms": { "field": "carname", "size": 10 }, "aggs": { "by_attribute": { "terms": { "field": "attribute", "size": 10 }, "aggs": { "max_timestamp": { "max": { "field": "@timestamp" } } } } } } }, "query": { "filtered": { "filter": { "bool": { "must": [ { "range": { "@timestamp": { "gte": "2014-10-27t00:00:00.000z", "lte": "2014-10-28t23:59:59.999z" } } } ] } } } } }

elasticsearch

No comments:

Post a Comment