Wednesday, 15 September 2010

R subset with condition using %in% or ==. Which one should be used? -



R subset with condition using %in% or ==. Which one should be used? -

this question has reply here:

subsetting dataframe in r multiple conditions 7 answers

usually, if want subset dataframe conditioning of values variable i'm using subset , %in%:

x <- data.frame(u=1:10,v=letters[1:10]) x subset(x, v %in% c("a","d"))

now, found out == gives same result:

subset(x, v == c("a","d"))

i'm wondering if identically or if there reason prefere 1 on other. help.

edit (@mrflick): question asks not same this here asks how not include several values: (!x %in% c('a','b')). asked why got same if utilize ==or %in%.

you should utilize first 1 %in% because got result because in illustration dataset, in order of recycling of a, d. here, comparing

rep(c("a", "d"), length.out= nrow(x)) # 1] "a" "d" "a" "d" "a" "d" "a" "d" "a" "d" x$v==rep(c("a", "d"), length.out= nrow(x))# because of coincidence #[1] true false false true false false false false false false subset(x, v == c("d","a")) #[1] u v #<0 rows> (or 0-length row.names)

while in above

x$v==rep(c("d", "a"), length.out= nrow(x)) #[1] false false false false false false false false false false

whereas %in% works

subset(x, v %in% c("d","a")) # u v #1 1 #4 4 d

r subset

No comments:

Post a Comment