scikit learn - How to use multiple input features with associated extractors in a pipeline? -
i working on classification task scikit-learn. have info set in each observation comprises 2 separate text fields. want set pipeline in each text field passed in parallel through own tfidfvectorizer , outputs of tfidfvectorizer objects passed classifier. aim able optimize parameters of 2 tfidfvectorizer objects along of classifier, using gridsearchcv.
the pipeline might depicted follows:
text 1 -> tfidfvectorizer 1 --------| +---> classifier text 2 -> tfidfvectorizer 2 --------|
i understand how without using pipeline (by creating tfidfvectorizer objects , working there), how set within pipeline?
thanks help,
rob.
use pipeline
, featureunion
classes. code case like:
pipeline = pipeline([ ('features', featureunion([ ('c1', pipeline([ ('text1', extracttext1()), ('tf_idf1', tfidfvectorizer()) ])), ('c2', pipeline([ ('text2', extracttext2()), ('tf_idf2', tfidfvectorizer()) ])) ])), ('classifier', multinomialnb()) ])
you can grid search on entire construction referring parameters using <estimator1>__<estimator2>__<parameter>
syntax. illustration features__c1__tf_idf1__min_df
refers min_df
parameter of tfidfvectorizer 1
diagram.
scikit-learn
No comments:
Post a Comment