Constrained Graph Embedding — do.cge • Rdimtools

Constrained Graph Embedding (CGE) is a semi-supervised embedding method that incorporates partially available label information into the graph structure that find embeddings consistent with the labels.

do.cge(
  X,
  label,
  ndim = 2,
  type = c("proportion", 0.1),
  preprocess = c("null", "center", "scale", "cscale", "whiten", "decorrelate")
)

Arguments

X: an \((n\times p)\) matrix or data frame whose rows are observations
label: a length-\(n\) vector of data class labels. It should contain NA elements for missing label.
ndim: an integer-valued target dimension.
type: a vector of neighborhood graph construction. Following types are supported; c("knn",k), c("enn",radius), and c("proportion",ratio). Default is c("proportion",0.1), connecting about 1/10 of nearest data points among all data points. See also aux.graphnbd for more details.
preprocess: an additional option for preprocessing the data. Default is "null". See also aux.preprocess for more details.

Value

a named list containing

Y: an \((n\times ndim)\) matrix whose rows are embedded observations.
trfinfo: a list containing information for out-of-sample prediction.

References

He X, Ji M, Bao H (2009). “Graph Embedding with Constraints.” In IJCAI.

Author

Kisung You

Examples

## use iris data
data(iris)
X     = as.matrix(iris[,2:4])
label = as.integer(iris[,5])
lcols = as.factor(label)

## copy a label and let 10% of elements be missing
nlabel = length(label)
nmissing = round(nlabel*0.10)
label_missing = label
label_missing[sample(1:nlabel, nmissing)]=NA

## try different neighborhood sizes
out1 = do.cge(X, label_missing, type=c("proportion",0.10))
out2 = do.cge(X, label_missing, type=c("proportion",0.25))
out3 = do.cge(X, label_missing, type=c("proportion",0.50))

## visualize
opar = par(no.readonly=TRUE)
par(mfrow=c(1,3))
plot(out1$Y, main="10% connected",  pch=19, col=lcols)
plot(out2$Y, main="25% connected", pch=19, col=lcols)
plot(out3$Y, main="50% connected", pch=19, col=lcols)

par(opar)