What's a good database for full text search on a large number of relatively small text documents? (C# backend) -
i designing scheme aims ingest big numbers of documents. want back upwards total text search on document contents, other metadata (keyword/sentiment analysis). how keyword/sentiment analysis done beyond scope of question. worth considering sort of metadata needs live along side search-able documents.
the main assumptions are:
by big mean few 100,000 goal of reaching millions the documents 0-15kb. these documents text (utf-8) desire able full-text-search document contents hosted on single machine, no cloud/distributed services new documents inserted continuously (roughly 1-2 per second) ad hoc text searches more complicated query utilize cases be: show me documents 'widgets' positive daterangec# language of selection fetching documents, processing, storing , retrieving db. having c# bindings big plus. or @ to the lowest degree easy way bridge gap.
naive approacha naive approach utilize mysql along apache's lucene. having document contents stored files references them in db, or having document contents text field in databse.
then utilize 1 of c# wrappers lucene lucene.net
my concern/question approach whether or not size of info , want much mysql. know silly premature optimization, , oftentimes people think need 'big data' solution when turns out regular sql database fine. other main concern approach 'clunky' , cumbersome develop compared potential alternatives.
alternativesfrom doing research, 1 alternative looks promising using couchdb lucene. have come across 2 libraries solve this:
couchdb-lucene divan what i'm looking for:i haven't done whole lot size of data. wonder:
does amount of info , utilize case merit non-relational database? should documents live in database, or files references in database? is there database/full-text-search technology particularly suited scenario haven't considered?
i suggest ravendb. uses lucene , 100% .net. has text analyzers doing total text indexing , fuzzy searches.
c# database-design full-text-search sentiment-analysis keyword-search
No comments:
Post a Comment