java - Spark TF-IDF getting the words back from hash -
i next this example spark documentation calculating tf-idf bunch of documents. spark uses hashing trick calculations @ end vector containing hashed words , corresponding weight but... how can words hash?
do have hash words , save them in map later iterate through looking keywords? there no more efficient way built-in spark?
thanks in advance
the transformation of string hash in hashingtf results in positive integer between 0 , numfeatures
(default 2^20) using org.apache.spark.util.utils.nonnegativemod(int, int).
the original string lost; there no way convert resulting integer input string.
java hash apache-spark tf-idf
No comments:
Post a Comment