Wednesday 15 August 2012

r - Finding values in one vector that are between the values in another vector -



r - Finding values in one vector that are between the values in another vector -

i need help finding values in vector between key values, non-inclusive.

for example, next vectors x , y

x <- c(2, 6, 10) y <- c(7, 1, 9, 12, 4, 6, 3)

i'd find values in y between x not equal x, result be

list(y[y > 2 & y < 6], y[y > 6 & y < 10]) # [[1]] # [1] 4 3 # # [[2]] # [1] 7 9

so in above result,

3 , 4 between 2 , 6 7 , 9 between 6 , 10 12 not between anything, excluded 6 equal 6, excluded

i've been working on little while , i'm stumped. i'd show code it's plain ugly.

how can find values in 1 vector between values in vector?

maybe work you:

lapply(split(y[y > min(x) & y < max(x)], findinterval(y[y > min(x) & y < max(x)], x)), function(z) z[!z %in% x]) # $`1` # [1] 4 3 # # $`2` # [1] 7 9

of course, might improve maintain dry , subset "y" before splitting, example, using between (or %between%) "data.table":

library(data.table) z <- y[y %between% range(x) & !y %in% x] split(z, findinterval(z, x)) # $`1` # [1] 4 3 # # $`2` # [1] 7 9 update

for reference, 3 options far pretty fast:

set.seed(1) x <- sort(sample(100000, 20, false)) y <- sample(100000, 100000, true) <- function(x, y) { z <- y[y %between% range(x) & !y %in% x] split(z, findinterval(z, x)) } da <- function(x, y) { indx <- map(function(x, z) x + seq_len(z), x[-length(x)], diff(x) - 1) lapply(indx, function(x) y[y %in% x]) } user <- function(x, y) { m <- t(diff(sign(outer(x, y, "-"))) == 2) split((m*y)[m], col(m)[m]) } library(microbenchmark) microbenchmark(am(x, y), da(x, y), user(x, y)) # unit: milliseconds # expr min lq mean median uq max neval # am(x, y) 22.58939 23.24731 26.29092 23.79639 25.64548 140.5610 100 # da(x, y) 149.46997 157.48534 162.47526 160.01823 164.74851 287.0808 100 # user(x, y) 327.38835 437.44064 445.71955 446.65938 467.97784 637.3121 100

r

No comments:

Post a Comment