python - What is a scipy.sparse matrix in the CSR format? -
im new scikit , scipy , tried following:
# -- coding: utf-8 -- sklearn.feature_extraction import featurehasher info = [[('this', 'is'), ('is', 'a'), ('a', 'text')], [('and', 'one'), ('one', 'more')],] fh = featurehasher(input_type='string') x = fh.transform(((' '.join(x) x in sample) sample in data)) print x
the problem dont understand output:
(0, 18882) 1.0 (0, 908056) 1.0 (0, 1003453) 1.0 (1, 433727) 1.0 (1, 575892) -1.0
could explain me output means?. read documentation of featurehasher() method didnt understad it.
this display of big sparse matrix, implemented in scipy.sparse
.
(0, 18882) 1.0 (0, 908056) 1.0 (0, 1003453) 1.0 (1, 433727) 1.0 (1, 575892) -1.0
x.shape
give dimensions of it. x.todense()
produces regular numpy
matrix, lot of 0 values.
here's sample of much smaller sparse matrix:
in [182]: scipy import sparse in [183]: x=sparse.csr_matrix([[0,1,2],[1,0,0]]) in [184]: x out[184]: <2x3 sparse matrix of type '<type 'numpy.int32'>' 3 stored elements in compressed sparse row format> in [185]: print x (0, 1) 1 (0, 2) 2 (1, 0) 1 in [186]: x.todense() out[186]: matrix([[0, 1, 2], [1, 0, 0]]) in [187]: x.toarray() out[187]: array([[0, 1, 2], [1, 0, 0]])
the print x
shows nonzero values of matrix, in (row, col) value
format.
your x
@ to the lowest degree (2,1003454)
matrix, zeros.
python numpy scipy scikit-learn
No comments:
Post a Comment