Ensembles of K-Subspaces — EKSS • T4cluster

Ensembles of K-Subspaces method exploits multiple runs of K-Subspace Clustering and uses consensus framework to aggregate multiple clustering results to mitigate the effect of random initializations. When the results are merged, it zeros out \(n-q\) number of values in a co-occurrence matrix. The paper suggests to use large number of runs (B) where each run may not require large number of iterations (iter) since the main assumption of the algorithm is to utilize multiple partially-correct information. At the extreme case, iteration iter may be set to 0 for which the paper denotes it as EKSS-0.

EKSS(data, k = 2, d = 2, q = floor(nrow(data) * 0.75), B = 500, iter = 0)

Arguments

data	an \((n\times p)\) matrix of row-stacked observations.
k	the number of clusters (default: 2).
d	candidate dimension for each subspace (default: 2).
q	threshold; the number of smaller values to be zeroed out (default: 0.75*\(n\)).
B	the number of ensembles/runs (default: 500).
iter	the number of iteration for each run (default: 0).

Value

a named list of S3 class T4cluster containing

cluster: a length-\(n\) vector of class labels (from \(1:k\)).
algorithm: name of the algorithm.

References

Lipor J, Hong D, Tan YS, Balzano L (2021). “Subspace Clustering Using Ensembles of \(K\)-Subspaces.” arXiv:1709.04744.

Examples

# \donttest{
## generate a toy example
set.seed(10)
tester = genLP(n=100, nl=2, np=1, iso.var=0.1)
data   = tester$data
label  = tester$class

## do PCA for data reduction
proj = base::eigen(stats::cov(data))$vectors[,1:2]
dat2 = data%*%proj

## run EKSS algorithm with k=2,3,4 with EKSS-0 and 5 iterations
out2zero = EKSS(data, k=2)
out3zero = EKSS(data, k=3)
out4zero = EKSS(data, k=4)

out2iter = EKSS(data, k=2, iter=5)
out3iter = EKSS(data, k=3, iter=5)
out4iter = EKSS(data, k=4, iter=5)

## extract label information
lab2zero = out2zero$cluster
lab3zero = out3zero$cluster
lab4zero = out4zero$cluster

lab2iter = out2iter$cluster
lab3iter = out3iter$cluster
lab4iter = out4iter$cluster

## visualize
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,3))
plot(dat2, pch=19, cex=0.9, col=lab2zero, main="EKSS-0:K=2")
plot(dat2, pch=19, cex=0.9, col=lab3zero, main="EKSS-0:K=3")
plot(dat2, pch=19, cex=0.9, col=lab4zero, main="EKSS-0:K=4")
plot(dat2, pch=19, cex=0.9, col=lab2iter, main="EKSS iter:K=2")
plot(dat2, pch=19, cex=0.9, col=lab3iter, main="EKSS iter:K=3")
plot(dat2, pch=19, cex=0.9, col=lab4iter, main="EKSS iter:K=4")
par(opar)
# }