r - How to return the positions of first occurrence for (different) duplicated rows in a data.frame? -
suppose have info frame following:
dfiris <- rbind(iris[1:5, -5], iris[1:5, -5], iris[1:5, -5], iris[1:5, -5], iris[1:5, -5])
since first 5 rows repeated other 4 times, efficiently get:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
the function duplicate()
not help me because returns true sec occurrence on of duplicated row.
my (inefficient) solution:
apply(dfiris, 1, function(df) { which(apply(unique(dfiris), 1, function(df_u) identical(df, df_u))) })
there must quicker way that. suggestions?
using data.table:
library(data.table) setdt(dfiris, keep.rownames=true) print(setkey(dfiris[, list(rn=as.numeric(rn), firstocc=.i[1]), by=c(names(dfiris)[-1])], rn))
r data.frame duplicates
No comments:
Post a Comment