Wednesday, 15 April 2015

python - Elasticsearch full-text autocomplete -



python - Elasticsearch full-text autocomplete -

i'm using elasticsearch through python requests library. i've set analysers so:

"analysis" : { "analyzer": { "my_basic_search": { "type": "standard", "stopwords": [] }, "my_autocomplete": { "type": "custom", "tokenizer": "keyword", "filter": ["lowercase", "autocomplete"] } }, "filter": { "autocomplete": { "type": "edge_ngram", "min_gram": 1, "max_gram": 20, } } }

i've got list of artists i'd search using autocomplete: current test case 'bill w', should match 'bill withers' etc - artist mapping looks (this output of get http://localhost:9200/my_index/artist/_mapping):

{ "my_index" : { "mappings" : { "artist" : { "properties" : { "clean_artist_name" : { "type" : "string", "analyzer" : "my_basic_search", "fields" : { "autocomplete" : { "type" : "string", "index_analyzer" : "my_autocomplete", "search_analyzer" : "my_basic_search" } } }, "submitted_date" : { "type" : "date", "format" : "basic_date_time" }, "total_count" : { "type" : "integer" } } } } } }

...and run query autocomplete:

"query": { "function_score": { "query": { "bool": { "must" : { "match": { "clean_artist_name.autocomplete": "bill w" } }, "should" : { "match": { "clean_artist_name": "bill w" } }, } }, "functions": [ { "script_score": { "script": "artist-score" } } ] } }

this seems match artists contain either 'bill' or 'w' 'bill withers': wanted match artists contain exact string. analyser seems working fine, here output of http://localhost:9200/my_index/_analyze?analyzer=my_autocomplete&text=bill%20w:

{ "tokens" : [ { "token" : "b", "start_offset" : 0, "end_offset" : 6, "type" : "word", "position" : 1 }, { "token" : "bi", "start_offset" : 0, "end_offset" : 6, "type" : "word", "position" : 1 }, { "token" : "bil", "start_offset" : 0, "end_offset" : 6, "type" : "word", "position" : 1 }, { "token" : "bill", "start_offset" : 0, "end_offset" : 6, "type" : "word", "position" : 1 }, { "token" : "bill ", "start_offset" : 0, "end_offset" : 6, "type" : "word", "position" : 1 }, { "token" : "bill w", "start_offset" : 0, "end_offset" : 6, "type" : "word", "position" : 1 } ] }

so why not excluding matches 'bill' or 'w' in there? there in query allowing results match my_basic_search analyser?

i believe need "term" filter instead of "match" 1 "must". have split artist names in ngrams searching text should match 1 of ngrams. happen need "term" match ngrams:

"query": { "function_score": { "query": { "bool": { "must" : { "term": { "clean_artist_name.autocomplete": "bill w" } }, "should" : { "match": { "clean_artist_name": "bill w" } }, } }, "functions": [ { "script_score": { "script": "artist-score" } } ] } }

python elasticsearch python-requests

No comments:

Post a Comment