Breedlove: scikit learn - How to use multiple input features with associated extractors in a pipeline? -

Wednesday 15 January 2014

scikit learn - How to use multiple input features with associated extractors in a pipeline? -

i working on classification task scikit-learn. have info set in each observation comprises 2 separate text fields. want set pipeline in each text field passed in parallel through own tfidfvectorizer , outputs of tfidfvectorizer objects passed classifier. aim able optimize parameters of 2 tfidfvectorizer objects along of classifier, using gridsearchcv.

the pipeline might depicted follows:

text 1 -> tfidfvectorizer 1 --------|                                     +---> classifier text 2 -> tfidfvectorizer 2 --------|

i understand how without using pipeline (by creating tfidfvectorizer objects , working there), how set within pipeline?

thanks help,

rob.

use pipeline , featureunion classes. code case like:

pipeline = pipeline([   ('features', featureunion([     ('c1', pipeline([       ('text1', extracttext1()),       ('tf_idf1', tfidfvectorizer())     ])),     ('c2', pipeline([       ('text2', extracttext2()),       ('tf_idf2', tfidfvectorizer())     ]))   ])),   ('classifier', multinomialnb()) ])

you can grid search on entire construction referring parameters using <estimator1>__<estimator2>__<parameter> syntax. illustration features__c1__tf_idf1__min_df refers min_df parameter of tfidfvectorizer 1 diagram.

scikit-learn

Breedlove

Wednesday 15 January 2014

scikit learn - How to use multiple input features with associated extractors in a pipeline? -

No comments:

Post a Comment