Wednesday 15 September 2010

Cleaning column data in R -



Cleaning column data in R -

hi wrote function clean info in r:

periodcleanse <- function(x) { if (x == ""){ homecoming (""); } else if (substr(x, nchar(x), nchar(x)) == "m"){ return(30*as.numeric(substr(x, 1, nchar(x)-1))); } else if (substr(x, nchar(x), nchar(x)) == "y"){ return(365*as.numeric(substr(x, 1, nchar(x)-1))); } else if (substr(x, nchar(x), nchar(x)) == "d"){ homecoming (as.numeric(substr(x, 1, nchar(x)-1))); } }

my df looks this:

period 3m 5y 1d 7m

i want phone call

df$period <- periodcleanse(df$period))

but getting:

warning message: in if (x == "") { : status has length > 1 , first element used

and nil happens. should do?

i create vectorized function both save writing endless if else , running in loop (sapply)

periodcleanse2 <- function(x){ matchdat <- data.frame(a = c("m", "y", "d"), b = c(30, 365, 1)) # can take part out of function improving speed indx <- gsub("\\d", "", x) indx2 <- as.numeric(gsub("[a-z]", "", x)) matchdat$b[match(indx, matchdat$a)] * indx2 } periodcleanse2(df$period) ## [1] 90 1825 na 1 210

r

No comments:

Post a Comment