Choosing the number of basis vectors (the rank
Q) is the central tuning problem in NMF. nmfkc
offers several rank-selection tools, each based on a different
principle:
| function | principle | a good rank… |
|---|---|---|
nmfkc.rank |
integrated diagnostics: R-squared, effective rank, element-wise CV | minimises CV error / R-squared elbow |
nmfkc.ecv |
element-wise cross-validation (Wold) | minimises held-out error |
nmfkc.bicv |
block bi-cross-validation (Owen & Perry) | minimises held-out error |
nmfkc.consensus |
clustering stability (Brunet) | maximises stability (or minimises PAC) |
nmfkc.ard |
Bayesian automatic relevance determination (Tan & Fevotte) | surviving components after pruning |
Since there is no single “best” criterion, this vignette applies all of them to a synthetic dataset whose true rank is known, so we can see which ones recover it.
We build a non-negative 24 x 60 matrix from
three feature blocks and three sample
clusters (20 samples each), then add noise. By construction the true
rank is 3.
set.seed(123)
r_true <- 3
P <- 24 # features
n_each <- 20 # samples per cluster
N <- r_true * n_each # 60 samples
# basis: each latent factor is a distinct block of features
X <- matrix(0, P, r_true)
X[1:8, 1] <- 1; X[9:16, 2] <- 1; X[17:24, 3] <- 1
X <- X + matrix(runif(P * r_true, 0, 0.1), P, r_true) # small background
# coefficients: each sample loads mainly on its own cluster's factor
cl_true <- rep(1:r_true, each = n_each)
B <- matrix(runif(r_true * N, 0, 0.2), r_true, N)
for (j in seq_len(N)) B[cl_true[j], j] <- runif(1, 1, 2)
Y <- X %*% B
Y <- Y + matrix(rnorm(P * N, 0, 0.15 * mean(X %*% B)), P, N) # noise
Y[Y < 0] <- 0
dim(Y)
#> [1] 24 60
image(1:N, 1:P, t(Y), xlab = "sample", ylab = "feature",
main = "Synthetic data (true rank = 3)")nmfkc.ranknmfkc.rank() fits each rank and draws three curves on
one plot: R-squared (red, with a “Best (Elbow)”
marker), the broken-stick-corrected effective rank
index (green, shown for context), and the element-wise ECV
error (blue, with a “Best (Min)” marker). The recommended rank
is the ECV minimum (or the R-squared elbow).
rk <- nmfkc.rank(Y, rank = 1:6)
#> Y(24,60)~X(24,1)B(1,60)...0sec
#> Y(24,60)~X(24,2)B(2,60)...0sec
#> Y(24,60)~X(24,3)B(3,60)...0sec
#> Y(24,60)~X(24,4)B(4,60)...0sec
#> Y(24,60)~X(24,5)B(5,60)...0sec
#> Y(24,60)~X(24,6)B(6,60)...0sec
#> Running Element-wise CV (this may take time)...
#> Performing Element-wise CV for Q = 1,2,3,4,5,6 (5-fold)...
#> Note: sample-clustering quality (silhouette / CPCC / dist.cor) is not part of rank selection; compute it from a list of fits with nmf.cluster.criteria(). See ?nmf.cluster.criteriark$rank.best # recommended rank (ECV minimum)
#> [1] 3
round(rk$criteria, 3)
#> rank effective.rank effective.rank.ratio r.squared sigma.ecv
#> 1 1 NA NA 0.044 0.710
#> 2 2 2.000 1.000 0.549 0.520
#> 3 3 2.988 0.996 0.984 0.106
#> 4 4 3.438 0.860 0.985 0.113
#> 5 5 3.440 0.688 0.986 0.120
#> 6 6 4.587 0.764 0.987 0.132
#> effective.rank.expected effective.rank.index
#> 1 1.000 NA
#> 2 1.649 0.999
#> 3 2.301 0.982
#> 4 2.955 0.462
#> 5 3.609 0.000
#> 6 4.263 0.186nmfkc.ecv and
nmfkc.bicvBoth are predictive engines that return the held-out error per rank;
the recommended rank minimises it. nmfkc.ecv holds out
scattered entries (Wold’s CV); nmfkc.bicv holds
out a block of rows and columns at once (Owen & Perry).
ev <- nmfkc.ecv (Y, rank = 1:6)
#> Performing Element-wise CV for Q = 1,2,3,4,5,6 (5-fold)...
bv <- nmfkc.bicv(Y, rank = 1:6) # nfolds = 2 (Owen & Perry) by default
#> bi-CV: ranks 1,2,3,4,5,6, 2x2 fold grid (Owen-Perry 2009)...
data.frame(rank = 1:6,
sigma.ecv = round(ev$sigma, 3),
sigma.bicv = round(bv$sigma, 3))
#> rank sigma.ecv sigma.bicv
#> Q=1 1 0.710 0.725
#> Q=2 2 0.520 0.531
#> Q=3 3 0.106 0.110
#> Q=4 4 0.113 0.111
#> Q=5 5 0.120 0.111
#> Q=6 6 0.132 0.111
cat("ECV picks rank", which.min(ev$sigma),
"| bi-CV picks rank", bv$rank[which.min(bv$sigma)], "\n")
#> ECV picks rank 3 | bi-CV picks rank 3nmfkc.consensusFor each rank, NMF is run many times from random initialisations and
the reproducibility of the sample clustering is summarised by
cophenetic (Brunet), dispersion (Kim &
Park; higher = crisper) and pac (Senbabaoglu; lower = less
ambiguous).
cs <- nmfkc.consensus(Y, rank = 2:6, nrun = 20, keep.consensus = TRUE)
#> consensus: ranks 2,3,4,5,6, 20 runs/rank (Brunet 2004); 100 fits total...
cs
#> Consensus rank selection (Brunet 2004), 20 runs/rank
#> best: dispersion max = rank 3 | PAC min = rank 2
#> rank cophenetic dispersion pac
#> 2 0.999 0.916 0.000
#> 3 1.000 1.000 0.000
#> 4 1.000 0.968 0.009
#> 5 1.000 0.956 0.016
#> 6 0.999 0.904 0.081
plot(cs) # stability curvesEach panel is the consensus matrix at one rank, reordered by hierarchical clustering (blue = 0 / different cluster, red = 1 / same cluster):
A striking point: the consensus stays essentially three crisp
blocks even at ranks 4–6. The surplus basis vectors do
not create new reproducible clusters — at
ranks 4–5 they stay near-empty (the hard clustering still uses only
three factors), and at rank 6 a cluster splits only erratically
across restarts, which averages out (just a few intermediate, pink
entries, and a small drop in dispersion / rise in
PAC). So consensus reveals the genuine stable structure
(3) robustly, even when the rank is
over-specified — a useful property. (Use
plot(cs, type = "heatmap", rank = 3) to enlarge a single
rank with sample labels.)
nmfkc.ardARD fits NMF once at an over-complete rank and
prunes unneeded components automatically; the relevance bar plot is read
like a scree plot (look for the knee). Because it is a sensitive point
estimate, we aggregate over several random restarts
(nrun).
ar <- nmfkc.ard(Y, rank = 8, nrun = 20) # >=10-20 restarts for a stable mode
ar # see "rank over runs" for confidence
#> ARD-NMF rank determination (Tan-Fevotte 2013, L2 prior, 20 runs)
#> starting rank: 8 -> estimated rank (mode): 3
#> rank over runs: 3:20 (median 3)
#> relevance (representative run): 1 0.989 0.875 0 0 0 0 0
plot(ar)data.frame(
method = c("nmfkc.rank (ECV)", "nmfkc.ecv", "nmfkc.bicv",
"nmfkc.consensus (dispersion)", "nmfkc.consensus (PAC)",
"nmfkc.ard"),
estimate = c(rk$rank.best,
which.min(ev$sigma),
bv$rank[which.min(bv$sigma)],
cs$rank[which.max(cs$dispersion)],
cs$rank[which.min(cs$pac)],
ar$rank),
true = r_true
)
#> method estimate true
#> 1 nmfkc.rank (ECV) 3 3
#> 2 nmfkc.ecv 3 3
#> 3 nmfkc.bicv 3 3
#> 4 nmfkc.consensus (dispersion) 3 3
#> 5 nmfkc.consensus (PAC) 2 3
#> 6 nmfkc.ard 3 3A complementary view: nmf.cluster.flow() draws a Sankey
/ alluvial diagram of how the hard sample clustering changes as the rank
grows. We fit ranks 1:6 (so the list index equals the rank)
and colour the flows by the clustering at the true
rank, reference = 3. The adjacent-rank ARI
(agreement) is printed along the top.
fits <- lapply(1:6, function(q) nmfkc(Y, Q = q, print.dims = FALSE))
fl <- nmf.cluster.flow(fits, reference = 3)head(fl$clusters) # rows = individuals, columns = rank, entries = cluster id
#> 1 2 3 4 5 6
#> i1 1 1 1 2 2 3
#> i2 1 1 1 2 3 2
#> i3 1 1 1 1 1 1
#> i4 1 1 1 1 1 1
#> i5 1 1 1 1 1 1
#> i6 1 1 1 1 3 2At rank = 1 all 60 samples form a single group; at
rank = 3 they split cleanly into the three true clusters
(20 each); beyond that the groups over-split (the ARI to the next rank
starts to fall) — visual confirmation that 3 is the right
number of basis vectors here.
On this clean dataset with a clear rank-3 structure, almost
every criterion recovers the true rank: the predictive
cross-validations (nmfkc.ecv, nmfkc.bicv), the
integrated nmfkc.rank, the consensus
dispersion (with a crisp three-block consensus matrix) and
ARD (every nrun restart agrees on 3, relevance
dropping to zero after the third component) all give 3.
Only PAC leans to the coarser rank 2 — a small
hint of the low-rank bias of stability measures.
On real data the criteria often disagree much more: stability/consensus tends to favour low (coarse) ranks, and ARD’s count can track the starting rank when the energy spectrum decays smoothly (no clear knee). A robust practical recipe is to triangulate: use ECV for the predictive upper bound, the effective-rank / relevance knee for the lower bound, and the consensus heatmap to confirm an interpretable block structure — then choose one rank by purpose and domain knowledge.
See ?nmfkc.rank, ?nmfkc.ecv,
?nmfkc.bicv, ?nmfkc.consensus and
?nmfkc.ard (all cross-referenced) for details.