Sunday 15 September 2013

python - What is a scipy.sparse matrix in the CSR format? -



python - What is a scipy.sparse matrix in the CSR format? -

im new scikit , scipy , tried following:

# -- coding: utf-8 -- sklearn.feature_extraction import featurehasher info = [[('this', 'is'), ('is', 'a'), ('a', 'text')], [('and', 'one'), ('one', 'more')],] fh = featurehasher(input_type='string') x = fh.transform(((' '.join(x) x in sample) sample in data)) print x

the problem dont understand output:

(0, 18882) 1.0 (0, 908056) 1.0 (0, 1003453) 1.0 (1, 433727) 1.0 (1, 575892) -1.0

could explain me output means?. read documentation of featurehasher() method didnt understad it.

this display of big sparse matrix, implemented in scipy.sparse.

(0, 18882) 1.0 (0, 908056) 1.0 (0, 1003453) 1.0 (1, 433727) 1.0 (1, 575892) -1.0

x.shape give dimensions of it. x.todense() produces regular numpy matrix, lot of 0 values.

here's sample of much smaller sparse matrix:

in [182]: scipy import sparse in [183]: x=sparse.csr_matrix([[0,1,2],[1,0,0]]) in [184]: x out[184]: <2x3 sparse matrix of type '<type 'numpy.int32'>' 3 stored elements in compressed sparse row format> in [185]: print x (0, 1) 1 (0, 2) 2 (1, 0) 1 in [186]: x.todense() out[186]: matrix([[0, 1, 2], [1, 0, 0]]) in [187]: x.toarray() out[187]: array([[0, 1, 2], [1, 0, 0]])

the print x shows nonzero values of matrix, in (row, col) value format.

your x @ to the lowest degree (2,1003454) matrix, zeros.

python numpy scipy scikit-learn

No comments:

Post a Comment