Maximum Mean Discrepancy (MMD) as a measure of discrepancy between samples is employed as a test statistic for two-sample hypothesis test of equal distributions. Kernel matrix \(K\) is a symmetric square matrix that is positive semidefinite.

mmd2test(K, label, method = c("b", "u"), mc.iter = 999)

Arguments

K

kernel matrix or an object of kernelMatrix class from kernlab package.

label

label vector of class indices.

method

type of estimator to be used. "b" for biased and "u" for unbiased estimator of MMD.

mc.iter

the number of Monte Carlo resampling iterations.

Value

a (list) object of S3 class htest containing:

statistic

a test statistic.

p.value

\(p\)-value under \(H_0\).

alternative

alternative hypothesis.

method

name of the test.

data.name

name(s) of provided kernel matrix.

References

Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012). “A Kernel Two-Sample Test.” J. Mach. Learn. Res., 13, 723--773. ISSN 1532-4435.

Examples

## small test for CRAN submission
dat1 <- matrix(rnorm(60, mean= 1), ncol=2) # group 1 : 30 obs of mean  1
dat2 <- matrix(rnorm(50, mean=-1), ncol=2) # group 2 : 25 obs of mean -1

dmat <- as.matrix(dist(rbind(dat1, dat2)))  # Euclidean distance matrix
kmat <- exp(-(dmat^2))                      # build a gaussian kernel matrix
lab  <- c(rep(1,30), rep(2,25))             # corresponding label

mmd2test(kmat, lab)                         # run the code !
#> 
#> 	Kernel Two-sample Test with Maximum Mean Discrepancy
#> 
#> data:  kmat
#> MMD = 0.28769, p-value = 0.001
#> alternative hypothesis: two distributions are not equal
#> 

if (FALSE) {
## WARNING: computationally heavy. 
#  Let's compute empirical Type 1 error at alpha=0.05
niter = 496  
pvals1 = rep(0,niter)
pvals2 = rep(0,niter)
for (i in 1:niter){
  dat = matrix(rnorm(200),ncol=2)
  lab = c(rep(1,50), rep(2,50))
  lbd = 0.1
  kmat = exp(-lbd*(as.matrix(dist(dat))^2))
  pvals1[i] = mmd2test(kmat, lab, method="b")$p.value
  pvals2[i] = mmd2test(kmat, lab, method="u")$p.value
  print(paste("iteration ",i," complete..",sep=""))
}

#  Visualize the above at multiple significance levels
alphas = seq(from=0.001, to=0.999, length.out=100)
errors1 = rep(0,100)
errors2 = rep(0,100)
for (i in 1:100){
   errors1[i] = sum(pvals1<=alphas[i])/niter
   errors2[i] = sum(pvals2<=alphas[i])/niter
}

opar <- par(no.readonly=TRUE)
par(mfrow=c(1,2), pty="s")
plot(alphas, errors1, "b", main="Biased Estimator Error", 
     xlab="alpha", ylab="error", cex=0.5)
abline(a=0,b=1, lwd=1.5, col="red")
plot(alphas, errors2, "b", main="Unbiased Estimator Error", 
     xlab="alpha", ylab="error", cex=0.5)
abline(a=0,b=1, lwd=1.5, col="blue")
par(opar)
}