r - dplyr count distinct readable way -
i'm new using dplyr, need calculate distinct values in group. here's table example:
data=data.frame(aa=c(1,2,3,4,na), bb=c('a', 'b', 'a', 'c', 'c'))
i know can things like:
by_bb<-group_by(data, bb, add together = true) summarise(by_bb, mean(aa, na.rm=true), max(aa), sum(!is.na(aa)), length(aa))
but if want count of unique elements?
i can do:
> summarise(by_bb,length(unique(unlist(aa)))) bb length(unique(unlist(aa))) 1 2 2 b 1 3 c 2
and if want exclude nas cand do:
> summarise(by_bb,length(unique(unlist(aa[!is.na(aa)])))) bb length(unique(unlist(aa[!is.na(aa)]))) 1 2 2 b 1 3 c 1
but it's little unreadable me. there improve way kind of summarization?
how option:
data %>% # take data.frame "data" filter(!is.na(aa)) %>% # using "data", filter out rows nas in aa group_by(bb) %>% # then, filtered data, grouping "bb" summarise(unique_elements = n_distinct(aa)) # summarise unique elements per grouping #source: local info frame [3 x 2] # # bb unique_elements #1 2 #2 b 1 #3 c 1
use filter
filter out rows aa
has nas, grouping info column bb
, summarise counting number of unique elements of column aa
grouping of bb
.
as can see i'm making utilize of pipe operator %>%
can utilize "pipe" or "chain" commands when using dplyr. helps write readable code because it's more natural, e.g. write code left write , top bottom , not nested within out (as in illustration code).
in first part of question, wrote:
i know can things like:
by_bb<-group_by(data, bb, add together = true) summarise(by_bb, mean(aa, na.rm=true), max(aa), sum(!is.na(aa)), length(aa))
here's alternative (applying number of functions same column(s)):
data %>% filter(!is.na(aa)) %>% group_by(bb) %>% summarise_each(funs(mean, max, sum, n_distinct), aa) #source: local info frame [3 x 5] # # bb mean max sum n_distinct #1 2 3 4 2 #2 b 2 2 2 1 #3 c 4 4 4 1
r dplyr summarization
No comments:
Post a Comment