Friday 15 January 2010

Extract English words from a text in R -



Extract English words from a text in R -

i have text , need extract english language words it. instance want have function analyse vector

vector <- c("picture", "carpet", "lamp", "notaword", "anothernotaword")

and homecoming english language words vector i.e. "picture", "carpet", "lamp"

i understand definition of "english word" depends on dictionary satisfied basic dictionary.

you utilize bundle maintain qdapdictionaries (no need parent bundle qdap installed). if info more complex may need utilize tools tolower etc. create work. thought here see known word list ?gradyaugmented intersects words. here 2 similar approaches, first faster depending on data:

vector <- c("picture", "carpet", "lamp", "notaword", "anothernotaword") library(qdapdictionaries) vector[vector %in% gradyaugmented] ## [1] "picture" "carpet" "lamp" intersect(vector, gradyaugmented) ## [1] "picture" "carpet" "lamp"

the error receiving installing qdap sounds @ben bolker correct. need newer version (i'd suggest latest version) of data.table installed (use packageversion("data.table") check this). oversight on part not requiring minimal version of data.table, thought setdt (a function in data.table package) around appears not in version. solve particular problem wouldn't need install parent qdap package, qdapdictionaries.

r text word

No comments:

Post a Comment