Saturday 15 February 2014

java - Spark TF-IDF getting the words back from hash -



java - Spark TF-IDF getting the words back from hash -

i next this example spark documentation calculating tf-idf bunch of documents. spark uses hashing trick calculations @ end vector containing hashed words , corresponding weight but... how can words hash?

do have hash words , save them in map later iterate through looking keywords? there no more efficient way built-in spark?

thanks in advance

the transformation of string hash in hashingtf results in positive integer between 0 , numfeatures (default 2^20) using org.apache.spark.util.utils.nonnegativemod(int, int).

the original string lost; there no way convert resulting integer input string.

java hash apache-spark tf-idf

No comments:

Post a Comment