Saturday 15 March 2014

text - Article scoring algorithm by keywords -



text - Article scoring algorithm by keywords -

i looking algorithm can give score article based on weighted keywords.

so suppose have next article:

economic anxiety amid dwindling oil , gas industry raising hard questions future. shaping senate race in democrat seeking re-election in state long dominated republicans.

and have next keywords given weight (-100 100) of importance:

economic (50) senate (70) republicans (-100) democrats (100)

this means want article goes economy, senate , democrats have high end score, article repulicans score low. 1 simple solution seems add together values of keywords occuring in article. in reality article has 5 times word democrats, , 1 times word republicans occuring in text should still have low ranking.

my question is: there efficient , effective algorithms problem?

if have understood right, can annotating words have scored in set. illustration in python:

class="lang-py prettyprint-override">article = """economic anxiety amid dwindling oil , gas industry raising hard questions future. shaping senate race in democrat seeking re-election in state long dominated republicans.""" keyword_score = {'economic': 50, 'senate': 70, 'republicans': -100, 'democrats': 100} seen_keywords = set() score = 0 word in article.split(): word = word.lower() if word in keyword_score , word not in seen_keywords: score += keyword_score[word] seen_keywords.add(word) print(score)

that way words not scored twice.

algorithm text keyword scoring

No comments:

Post a Comment