Constrained Graph Embedding (CGE) is a semi-supervised embedding method that incorporates partially available label information into the graph structure that find embeddings consistent with the labels.
an \((n\times p)\) matrix or data frame whose rows are observations
a length-\(n\) vector of data class labels. It should contain NA
elements for missing label.
an integer-valued target dimension.
a vector of neighborhood graph construction. Following types are supported;
c("knn",k)
, c("enn",radius)
, and c("proportion",ratio)
.
Default is c("proportion",0.1)
, connecting about 1/10 of nearest data points
among all data points. See also aux.graphnbd
for more details.
an additional option for preprocessing the data.
Default is "null"
. See also aux.preprocess
for more details.
a named list containing
an \((n\times ndim)\) matrix whose rows are embedded observations.
a list containing information for out-of-sample prediction.
He X, Ji M, Bao H (2009). “Graph Embedding with Constraints.” In IJCAI.
## use iris data
data(iris)
X = as.matrix(iris[,2:4])
label = as.integer(iris[,5])
lcols = as.factor(label)
## copy a label and let 10% of elements be missing
nlabel = length(label)
nmissing = round(nlabel*0.10)
label_missing = label
label_missing[sample(1:nlabel, nmissing)]=NA
## try different neighborhood sizes
out1 = do.cge(X, label_missing, type=c("proportion",0.10))
out2 = do.cge(X, label_missing, type=c("proportion",0.25))
out3 = do.cge(X, label_missing, type=c("proportion",0.50))
## visualize
opar = par(no.readonly=TRUE)
par(mfrow=c(1,3))
plot(out1$Y, main="10% connected", pch=19, col=lcols)
plot(out2$Y, main="25% connected", pch=19, col=lcols)
plot(out3$Y, main="50% connected", pch=19, col=lcols)
par(opar)