Monday 15 April 2013

Aggregate ordinal and binary data according to cluster in R -



Aggregate ordinal and binary data according to cluster in R -

i performed k-medoid clustering analysis using cran cluster bundle r. info on data.frame called df4 13111 obs. of 11 binary , ordinal values. after clustering, applied cluster results original data.frame showing corresponding cluster number user id.

how aggregate binary , ordinal choices according cluster?

for example, gender variable has male/female values , age ranges "18-20", "21-24", "25-34", "35-44", "45-54", "55-64", , "65+”. want sum of male , female values per cluster variable gender , categories in age.

here’s head of data.frame cluster label column:

#12 variables because added clustering object data.frame #i included 2 variables r output > str(df4) 'data.frame': 13111 obs. of 12 variables: $ age : factor w/ 7 levels "18-20","21-24",..: 6 6 6 6 7 6 5 7 6 3 ... $ gender : factor w/ 2 levels "female","male": 1 1 2 2 2 1 2 1 2 2 … #i included 3 variables r output > head(df4) age gender 1 55-64 female 2 55-64 female 3 55-64 male 4 55-64 male 5 65+ male 6 55-64 female

here’s reproducible illustration similar dataset:

age <- c("18-20", "21-24", "25-34", "35-44", "45-54", "55-64", "65+") gender <- c("female", "female", "male", "male", "male", "male", "female") smalldf <- data.frame(age, gender) #import cluster bundle library(cluster) #create dissimilarity matrix #gower coefficient finding distance between mixed variable smalldaisy4 <- daisy(smalldf, metric = "gower", type = list(symm = c(2), ordratio = c(1))) #set randomization seed set.seed(1) #pam algorithm 3 clusters smallk4answers <- pam(smalldaisy4, 3, diss = true) #apply cluster ids original info frame smalldf$cluster <- smallk4answers$cluster

desired result of output (hypothetical):

cluster female male 18-20 21-24 25-34 35-44 45-54 55-64 65+ 1 1 1 1 1 2 1 0 3 1 0 2 2 2 1 1 1 0 1 2 0 0 3 3 0 1 1 1 1 1 0 2 3

let me know if can provide more information.

it looks want display 2 tables cluster-by-gender , cluster-by-age tabluation in 1 matrix:

with( smalldf, cbind(table(cluster, gender), table(cluster, age) ) ) #---------------- female male 18-20 21-24 25-34 35-44 45-54 55-64 65+ 1 2 0 1 1 0 0 0 0 0 2 0 4 0 0 1 1 1 1 0 3 1 0 0 0 0 0 0 0 1

r aggregate cluster-analysis

No comments:

Post a Comment