Sunday, 15 June 2014

r - Intradataframe Analysis--creating a derivative data frame from another data frame -



r - Intradataframe Analysis--creating a derivative data frame from another data frame -

this may little obtuse of question title since i'm still getting speed r i'm doing info frame manipulation extract percentages regarding classification groups captured 1 column factor against column wish obtain percentages from. i'll utilize built in mtcars demonstrate i'm trying achieve, gear playing role of classification variable, , cyl info i'm trying percentages from.

just background details smooth question:

the gear column spans 3 distinct values, 3,4,5. cyl column spans 3 distinct values well, 4,6,8

the first element of list says percentage of gear types have @ 4 cylinders. 3-gear models there one, toyota corona, out of total of 15 3-gear models, , percentage should 1/15 = 0.0667. 4-gear models there 8 out of total of 12 4-gear models, yield 8/12 = 0.667.

now here's method wrote computation. construction of output not desire. i'd instead merge info frame first column beingness distinct cyl values , other columns beingness 3, 4, , 5 gear types, rows various percentages. i'm close need help doing info reshaping of list achieving or maybe exercising alternative apply function accomplish table of percentages i'm chasing after, or other magic can cook up.

> lapply( unique( sort( y$cyl ) ) , function(c) { tapply( y$cyl , y$gear , function(x) sum( x <= c ) / length(x) ) } ) [[1]] 3 4 5 0.06666667 0.66666667 0.40000000 [[2]] 3 4 5 0.2 1.0 0.6 [[3]] 3 4 5 1 1 1

this expect info frame want appear as

cyl x3 x4 x5 1 4 0.06666667 0.6666667 0.4 2 6 0.20000000 1.0000000 0.6 3 8 1.00000000 1.0000000 1.0

i came solution after googling "convert list of arrays data.frame", lead me next post.

p <- lapply( unique( sort( mtcars$cyl ) ) , function(c) { tapply( mtcars$cyl , mtcars$gear , function(x) sum( x <= c ) / length(x) ) } ) > df <- data.frame( matrix( unlist(p) , nrow = length(p) , byrow=t ) ) > df x1 x2 x3 1 0.06666667 0.6666667 0.4 2 0.20000000 1.0000000 0.6 3 1.00000000 1.0000000 1.0

the solution works apart dropping of classification names column headers, looks follow assignment can recovered well...

> colnames(df) <- names(p[[1]]) > rownames(df) <- unique( sort( mtcars$cyl ) ) > df 3 4 5 4 0.06666667 0.6666667 0.4 6 0.20000000 1.0000000 0.6 8 1.00000000 1.0000000 1.0

actually, other answers linked question nicely address column headers issue, row header problem remains since values lost in anonymous function calls.

r data.frame reshape

No comments:

Post a Comment