Tuesday, 15 March 2011

r - Build a proper dataframe from a matrix list after importing .xlsx file -



r - Build a proper dataframe from a matrix list after importing .xlsx file -

implemented:

i importing .xlsx file r. file consists of 3 sheets. binding sheets list.

need implement

now want combine matrix lists single data.frame. header beingness --> names(dataset).

i tried using as.data.frame read.xlsx given in help did not work. explicitly tried as.data.frame(as.table(dataset)) still generates long list of data.frame nil want.

i want have construction header = names , values below that, how read.table imports data.

this code using:

xlfile <- list.files(pattern = "*.xlsx") wb <- loadworkbook(xlfile) sheet_ct <- wb$getnumberofsheets() b <- rbind(list(lapply(1:sheet_ct, function(x) { res <- read.xlsx(xlfile, x, as.data.frame = true, header = true) }))) b <- b [-c(1),] # want remove sec header

i want have info arrangement below.

ei mi hours nphy cphy chlphy nhet chet ndet cdet don doc din dic @ dccho tepc ncocco ccocco chlcocco piccocco par temp sal co2atm u10 dicfl co2ppm co2mol ph 1 1 1 1 0.1023488 0.6534707 0.1053458 0.04994161 0.3308593 0.04991916 0.3307085 0.05042275 49.76304 14.99330000 2050.132 2150.007 0.9642220 0.1339044 0.1040715 0.6500288 0.1087667 0.1000664 0.0000000 9.900000 31.31000 370 0.01 -2.963256000 565.1855 0.02562326 7.879427 2 1 1 2 0.1045240 0.6448216 0.1103250 0.04988347 0.3304699 0.04984045 0.3301691 0.05085697 49.52745 14.98729000 2050.264 2150.007 0.9308690 0.1652179 0.1076058 0.6386706 0.1164099 0.1001396 0.0000000 9.900000 31.31000 370 0.01 -2.971632000 565.7373 0.02564828 7.879042 3 1 1 3 0.1064772 0.6369597 0.1148174 0.04982555 0.3300819 0.04976363 0.3296314 0.05130091 49.29323 14.98221000 2050.396 2150.007 0.8997098 0.1941872 0.1104229 0.6291149 0.1225822 0.1007908 0.8695131 9.900000 31.31000 370 0.01 -2.980446000 566.3179 0.02567460 7.878636 4 1 1 4 0.1081702 0.6299084 0.1187672 0.04976784 0.3296952 0.04968840 0.3290949 0.05175249 49.06034 14.97810000 2050.524 2150.007 0.8705440 0.2210289 0.1125141 0.6213265 0.1273103 0.1018360 1.5513170 9.900000 31.31000 370 0.01 -2.989259000 566.8983 0.02570091 7.878231 5 1 1 5 0.1095905 0.6239005 0.1221460 0.04971029 0.3293089 0.04961446 0.3285598 0.05220978 48.82878 14.97485000 2050.641 2150.007 0.8431960 0.2459341 0.1140222 0.6152447 0.1308843 0.1034179 2.7777070 9.900000

please dont suggest me have info on single sheet , convert .xlsx .csv or simple text format. trying hard have proper dataframe .xlsx file.

following file

and post next : followup

this resulted:

str(full_data) 'data.frame': 0 obs. of 19 variables: $ experiment : factor w/ 2 levels "#","1": $ mesocosm : factor w/ 10 levels "#","1","2","3",..: $ exp.day : factor w/ 24 levels "1","10","11",..: $ hr : factor w/ 24 levels "108","12","132",..: $ temperature: factor w/ 125 levels "10","10.01","10.02",..: $ salinity : num $ ph : num $ dic : factor w/ 205 levels "1582.2925","1588.6475",..: $ ta : factor w/ 117 levels "1813","1826",..: $ din : factor w/ 66 levels "0.2","0.3","0.4",..: $ chl.a : factor w/ 156 levels "0.171","0.22",..: $ pic : factor w/ 194 levels "-0.47","-0.96",..: $ poc : factor w/ 199 levels "-0.046","1.733",..: $ pon : factor w/ 151 levels "1.675","1.723",..: $ pop : factor w/ 110 levels "0.032","0.034",..: $ doc : factor w/ 93 levels "100.1","100.4",..: $ don : factor w/ 1 level "µmol/l": $ dop : factor w/ 1 level "µmol/l": $ tep : factor w/ 100 levels "10.4934","11.0053",..: [note: above construction after reading .xlsx file......the levels makes calculation , manipulation part tedious , messy.]

this want achieve:

str(a)

'data.frame': 9936 obs. of 29 variables: $ ei : int 1 1 1 1 1 1 1 1 1 1 ... $ mi : int 1 1 1 1 1 1 1 1 1 1 ... $ hours : int 1 2 3 4 5 6 7 8 9 10 ... $ cphy : num 0.653 0.645 0.637 0.63 0.624 ... $ chlphy : num 0.105 0.11 0.115 0.119 0.122 ... $ nhet : num 0.0499 0.0499 0.0498 0.0498 0.0497 ... $ chet : num 0.331 0.33 0.33 0.33 0.329 ... $ ndet : num 0.0499 0.0498 0.0498 0.0497 0.0496 ... $ cdet : num 0.331 0.33 0.33 0.329 0.329 ... $ don : num 0.0504 0.0509 0.0513 0.0518 0.0522 ... $ doc : num 49.8 49.5 49.3 49.1 48.8 ... $ din : num 15 15 15 15 15 ... $ dic : num 2050 2050 2050 2051 2051 ... $ @ : num 2150 2150 2150 2150 2150 ... $ dccho : num 0.964 0.931 0.9 0.871 0.843 ... $ tepc : num 0.134 0.165 0.194 0.221 0.246 ... $ ncocco : num 0.104 0.108 0.11 0.113 0.114 ... $ ccocco : num 0.65 0.639 0.629 0.621 0.615 ... $ chlcocco: num 0.109 0.116 0.123 0.127 0.131 ... $ piccocco: num 0.1 0.1 0.101 0.102 0.103 ... $ par : num 0 0 0.87 1.55 2.78 ... $ temp : num 9.9 9.9 9.9 9.9 9.9 9.9 9.9 9.9 9.9 9.9 ... $ sal : num 31.3 31.3 31.3 31.3 31.3 ... $ co2atm : num 370 370 370 370 370 370 370 370 370 370 ... $ u10 : num 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ... $ dicfl : num -2.96 -2.97 -2.98 -2.99 -3 ... $ co2ppm : num 565 566 566 567 567 ... $ co2mol : num 0.0256 0.0256 0.0257 0.0257 0.0257 ... $ ph : num 7.88 7.88 7.88 7.88 7.88 ...

[note: sorry columns, dataset (simple text), reading read.table]

with na's handled:

> unique(mydf_1$exp.num) [1] # 1 levels: # 1 > unique(mydf_2$exp.num) [1] # 2 levels: # 2 > unique(mydf_3$exp.num) [1] # 3 levels: # 3 > unique(full_data$exp.num) [1] 2 3 4

without handling na's:

> unique(full_data$exp.num) [1] 1 na 2 3 > unique(full_data$mesocosm) [1] 1 2 3 4 5 6 7 8 9 na

i think need. add together few comments on doing:

xlfile <- list.files(pattern = "*.xlsx") wb <- loadworkbook(xlfile) sheet_ct <- wb$getnumberofsheets() for( in 1:sheet_ct) { #read sheets 3 separate dataframes (mydf_1, mydf_2, mydf3) print(i) variable_name <- sprintf('mydf_%s',i) assign(variable_name, read.xlsx(xlfile, sheetindex=i,startrow=1, endrow=209)) #using don't need utilize formula eliminate nas. need specify first , lastly rows. } colnames(mydf_1) <- names(mydf_2) #this here unclear. chose sec sheet's # names column names can chose whichever want using same (second , 3rd column had same names). #some of sheets loaded few blank rows (full of nas) remove #with next function according first column populated #according see remove_na_rows <- function(x) { x <- x[!is.na(x)] <- length(x==true) } mydf_1 <- mydf_1[1:remove_na_rows(mydf_1$exp.num),] mydf_2 <- mydf_2[1:remove_na_rows(mydf_2$exp.num),] mydf_3 <- mydf_3[1:remove_na_rows(mydf_3$exp.num),] full_data <- rbind(mydf_1[-1,],mydf_2[-1,],mydf_3[-1,]) #making 1 dataframe here full_data <- lapply(full_data,function(x) as.numeric(x)) #convert fields numeric full_data2$ei <- as.integer(full_data[['ei']]) #use convert column integer full_data2$mi <- as.integer(full_data[['mi']]) full_data2$hours <- as.integer(full_data[['hours']]) #*********code utilize removing na rows ***************** #so if rbind not caring na rows can utilize below rid of them #i tested , seems working n_row <- null ( in 1:nrow(full_data)) { x <- full_data[i,] if ( all(is.na(x)) ) { n_row <- append(n_row,i) } } full_data <- full_data[-n_row,]

i think need

r excel data.frame xlsx

No comments:

Post a Comment