do.rndproj
is a linear dimensionality reduction method based on
random projection technique, featured by the celebrated Johnson–Lindenstrauss lemma.
an (n×p) matrix or data frame whose rows are observations and columns represent independent variables.
an integer-valued target dimension.
an additional option for preprocessing the data.
Default is "null". See also aux.preprocess
for more details.
a type of random projection, one of "gaussian","achlioptas" or "sparse".
a tuning parameter for determining values in projection matrix. While default is to use max(log√p,3), it is required for s≥3.
a named list containing
an (n×ndim) matrix whose rows are embedded observations.
a (p×ndim) whose columns are basis for projection.
an estimated error ϵ in accordance with JL lemma.
a list containing information for out-of-sample prediction.
The Johnson-Lindenstrauss(JL) lemma states that given 0<ϵ<1, for a set X of m points in RN and a number n>8log(m)/ϵ2, there is a linear map f:RN to R^n such that (1−ϵ)|u−v|2≤|f(u)−f(v)|2≤(1+ϵ)|u−v|2 for all u,v in X.
Three types of random projections are supported for an (p-by-ndim)
projection matrix R.
Conventional approach is to use normalized Gaussian random vectors sampled from unit sphere Sp−1.
Achlioptas suggested to employ a sparse approach using samples from √3(1,0,−1) with probability (1/6,4/6,1/6).
Li et al proposed to sample from √s(1,0,−1) with probability (1/2s,1−1/s,1/2s) for s≥3 to incorporate sparsity while attaining speedup with little loss in accuracy. While the original suggsetion from the authors is to use √p or p/log(p) for s, any user-supported s≥3 is allowed.
Johnson WB, Lindenstrauss J (1984). “Extensions of Lipschitz Mappings into a Hilbert Space.” In Beals R, Beck A, Bellow A, Hajian A (eds.), Contemporary Mathematics, volume 26, 189--206. American Mathematical Society, Providence, Rhode Island. ISBN 978-0-8218-5030-5 978-0-8218-7611-4.
Achlioptas D (2003). “Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.” Journal of Computer and System Sciences, 66(4), 671--687.
Li P, Hastie TJ, Church KW (2006). “Very Sparse Random Projections.” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, 287--296.
## use iris data
data(iris)
set.seed(100)
subid = sample(1:150, 50)
X = as.matrix(iris[subid,1:4])
label = as.factor(iris[subid,5])
## 1. Gaussian projection
output1 <- do.rndproj(X,ndim=2)
## 2. Achlioptas projection
output2 <- do.rndproj(X,ndim=2,type="achlioptas")
## 3. Sparse projection
output3 <- do.rndproj(X,type="sparse")
## Visualize three different projections
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,3))
plot(output1$Y, pch=19, col=label, main="RNDPROJ::Gaussian")
plot(output2$Y, pch=19, col=label, main="RNDPROJ::Arclioptas")
plot(output3$Y, pch=19, col=label, main="RNDPROJ::Sparse")
par(opar)