r - Error in { : task 1524 failed - "cannot open the connection" -
i can't seem find help on problem on internet.
i'm running function in parallel using 'foreach' , 'doparallel' packages. function takes trained model , 2 info frames input, makes predictions, shuffles values 1 of variables, , makes predictions again. calculates rmse each variable , returns ones increment after shuffling. takes quite long time, have run in parallel. so, still takes 2 hours per model.
it doesn't seem issue function's code itself, maybe input, because i've run before without issue, , checked log file after error, , processed of variables. have 5 models want run function on. ran on 1 model first, , saved results. worked, want apply remaining models.
something seems going wrong after foreach loop done processing, since log file indicates variables analyzed. don't traceback indicates error occurring within loop.
thanks in advance help issue. allow me know if i'm not clear anything. i'm running windows7 , r version 3.1.
here error:
error in { : task 1524 failed - "cannot open connection" 10 stop(simpleerror(msg, phone call = expr)) 9 e$fun(obj, substitute(ex), parent.frame(), e$data) 8 foreach(variable = names(newdata), .export = c("calc.rmse", "catf", "start.timer", "stop.timer"), .combine = "rbind") %dopar% { baseline = null ... @ feature_selection.r#53 7 fun(c("ph", "ca", "p", "sand")[[1l]], ...) 6 lapply(x = x, fun = fun, ...) 5 sapply(names(amodels[2:length(amodels)]), analyze.features, newdata = test.data, newoutcomes = test.outcomes) @ script.r#59 4 eval(expr, envir, enclos) 3 eval(ei, envir) 2 withvisible(eval(ei, envir)) 1 source("~/%filepath%")
here code function in question:
analyze.features = function(newdata, newoutcomes, model.name) { model = amodels[[model.name]] file = "data/shuffled_data.csv" if(!file.exists(file)) { cat("creating shuffled info frame...\r\n") shuffled.data = as.data.frame(sapply(newdata, shuffle)) cat("writing shuffled info frame disk...\r\n") write.csv(shuffled.data, file) } else { cat("reading shuffled info file...\r\n") shuffled.data = read.csv(file) } # send output log file. writelines("", "log.txt") start.timer("about come in parallelization...") cat("time is: ", format(sys.time(), "%a %b %d %x %y"), "\r\n") output = foreach(variable = names(newdata), .export=c("calc.rmse", "catf", "start.timer", "stop.timer"), .combine="rbind") %dopar% { baseline = null shuffle = null sdata = newdata # write log file. catf("analyzing ", variable) sdata[[variable]] = shuffled.data[[variable]] baseline[[variable]] = suppresswarnings(calc.rmse(predict(model, newdata=newdata), newoutcomes)) shuffle[[variable]] = suppresswarnings(calc.rmse(predict(model, newdata=sdata), newoutcomes)) cbind(baseline=baseline, shuffle=shuffle) } stop.timer("total time analyze features") save.df(output, paste("rmse_", model.name, sep="")) # cut down list of kept features. maintain = row.names(output)[which(output[,2] - output[,1] > 0)] rm(output, shuffled.data) beep(1) return(keep) }
i managed work. had re-write how calling function. originally, calling through sapply function.
sapply(names(amodels), analyze.features, newdata=test.data, newoutcomes=test.outcomes)
since there 5 models, took them out of sapply , called analyze.features 1 time each model, in succession. job completed without error. don't know why; i'm guessing has parallelization within sapply function.
r foreach parallel-processing
No comments:
Post a Comment