Monday 15 June 2015

Errors when running Caret package in R -



Errors when running Caret package in R -

i attempting build model predict whether product sold on ecommerce website 1 or 0 beingness output.

my info handful of categorical variables, 1 big amount of levels, couple binary, , 1 continuous (the price), output variable of 1 or 0, whether or not product listing got sold.

this code:

intrainingset<-createdatapartition(c$sale, p=.75, list=false) ctrain<-c[intrainingset,] ctest<-c[-intrainingset,] gbmfit<-gbm(sale~., data=c,distribution="bernoulli",n.trees=5,interaction.depth=7,shrinkage= .01,) plot(gbmfit) gbmtune<-train(sale~.,data=ctrain, method="gbm") ctrl<-traincontrol(method="repeatedcv",repeats=5) gbmtune<-train(sale~.,data=ctrain, method="gbm", verbose=false, trcontrol=ctrl) ctrl<-traincontrol(method="repeatedcv", repeats=5, classprobs=true, summaryfunction = twoclasssummary) gbmtune<-traincontrol(sale~., data=ctrain, method="gbm", metric="roc", verbose=false , trcontrol=ctrl) grid<-expand.grid(.interaction.depth=seq(1,7, by=2), .n.trees=seq(100,300, by=50), .shrinkage=c(.01,.1)) gbmtune<-train(sale~., data=ctrain, method="gbm", metric="roc", tunegrid= grid, verebose=false, trcontrol=ctrl) set.seed(1) gbmtune <- train(sale~., info = ctrain, method = "gbm", metric = "roc", tunegrid = grid, verbose = false, trcontrol = ctrl)

i running 2 issues. first when effort add together summaryfunction=twoclasssummary, , tune this:

error in traincontrol(sale ~ ., info = ctrain, method = "gbm", metric = "roc", : unused arguments (data = ctrain, metric = "roc", trcontrol = ctrl)

the sec problem if decide bypass summaryfunction, when seek , run model error:

error in evalsummaryfunction(y, wts = weights, ctrl = trcontrol, lev = classlevels, : train()'s utilize of roc codes requires class probabilities. see classprobs alternative of traincontrol() in addition: warning message: in train.default(x, y, weights = w, ...) : cannnot compute class probabilities regression

i tried changing output variable numeric value of 1 or 0, text value, in excel, didn't create difference.

any help appreciated on how prepare fact it's interpreting model regression, or first error message encountering.

best,

will will@nubimetrics.com

your outcome is:

sale = c(1l, 0l, 1l, 1l, 0l))

although gbm expects way, pretty unnatural way encode data. every other function uses factors.

so if give train numeric 0/1 data, thinks want regression. if convert factor , used "0" , "1" levels (and if want class probabilities), should have seen warning says "at to the lowest degree 1 of class levels not valid r variables names; may cause errors if class probabilities generated because variables names converted to...". not idle warning.

use factor levels valid r variable names , should fine.

max

r r-caret

No comments:

Post a Comment