Friday, 15 May 2015

r - How to return the positions of first occurrence for (different) duplicated rows in a data.frame? -



r - How to return the positions of first occurrence for (different) duplicated rows in a data.frame? -

suppose have info frame following:

dfiris <- rbind(iris[1:5, -5], iris[1:5, -5], iris[1:5, -5], iris[1:5, -5], iris[1:5, -5])

since first 5 rows repeated other 4 times, efficiently get:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

the function duplicate() not help me because returns true sec occurrence on of duplicated row.

my (inefficient) solution:

apply(dfiris, 1, function(df) { which(apply(unique(dfiris), 1, function(df_u) identical(df, df_u))) })

there must quicker way that. suggestions?

using data.table:

library(data.table) setdt(dfiris, keep.rownames=true) print(setkey(dfiris[, list(rn=as.numeric(rn), firstocc=.i[1]), by=c(names(dfiris)[-1])], rn))

r data.frame duplicates

No comments:

Post a Comment