Skip to content

solver

eempy.solver.core

Solver routines for excitation–emission matrix (EEM) decomposition.

This module provides numerical solvers used to decompose an EEM stack into a small number of non-negative components using matrix (NMF) and tensor (CP/PARAFAC) factorization models.

Implemented features include: - HALS/ALS-style NNLS updates for NMF and CP factors, - quadratic (Tikhonov) priors with NaN-masked entries, - elastic-net regularization (L1/L2 mix), - optional paired-sample ratio constraints on sample-mode scores, - elementwise masking to ignore sparse invalid tensor entries.

hals_nnls

hals_nnls(UtM, UtU, prior_dict=None, gamma=None, alpha=0, l1_ratio=0, V=None, max_iter=500, tol=1e-08, eps=1e-08, fixed_components=None)

HALS-style non-negative least squares update with optional priors and elastic-net.

This routine updates a factor matrix V (with shape (rank, n_features)) in a least-squares subproblem in NMF and PARAFAC:

min_{V >= 0}  1/2 || UtM - UtU @ V ||_F^2
              + sum_k (gamma_k/2) ||mask_k * (V_k - p_k)||_2^2
              + alpha * [ l1_ratio * ||V||_1 + (1 - l1_ratio)/2 * ||V||_F^2 ]

where: - UtM corresponds to UᵀM (or a mode-wise MTTKRP in PARAFAC), - UtU corresponds to UᵀU (Gram matrix), - prior_dict[k] = p_k is an optional per-component prior vector for row k, with NaNs allowed to indicate "no prior" at specific entries, - gamma can be a scalar applied to all prior components or a dict {k: gamma_k} to specify gamma values for different priors.

The update is performed row-by-row (FastHALS residual update), and can keep selected components fixed (useful when seeding or freezing known spectra).

Parameters:

Name Type Description Default
UtM np.ndarray, shape (rank, n)

Cross-product term (e.g., UᵀM or MTTKRPᵀ), with one row per component.

required
UtU np.ndarray, shape (rank, rank)

Gram matrix term (e.g., UᵀU).

required
prior_dict dict[int, array-like]

Mapping from component index k to a prior vector of length n. NaNs are treated as "no prior" for those entries.

None
gamma float or dict[int, float]

Prior strength. If a float is provided, it is applied to all keys in prior_dict. If a dict is provided, strengths are component-specific.

None
alpha float

Overall elastic-net strength (0 disables elastic-net).

0
l1_ratio float

Elastic-net mixing parameter in [0, 1]. 0 = pure L2, 1 = pure L1.

0
V np.ndarray, shape (rank, n)

Initial value for V. If None, a least-squares solution is computed and clipped to be positive.

None
max_iter int

Maximum number of HALS iterations.

500
tol float

Relative convergence tolerance based on HALS update magnitude.

1e-08
eps float

Small positive constant used for numerical stability and positivity floor.

1e-08
fixed_components list[int]

Indices of components (rows of V) that should not be updated.

None

Returns:

Name Type Description
V (ndarray, shape(rank, n))

Updated non-negative factor matrix.

nmf_with_prior_hals

nmf_with_prior_hals(X, rank, prior_dict_H=None, prior_dict_W=None, prior_dict_A=None, prior_dict_B=None, prior_dict_C=None, ref_components=None, gamma_W=0, gamma_H=0, gamma_A=0, gamma_B=0, gamma_C=0, alpha_W=0, alpha_H=0, alpha_A=0, alpha_B=0, alpha_C=0, l1_ratio=0, idx_top=None, idx_bot=None, lam=0, fit_rank_one=False, component_shape=None, max_iter_als=500, max_iter_nnls=500, tol=1e-06, eps=1e-08, init='random', custom_init=None, fixed_components=None, random_state=None)

Non-negative matrix factorization (NMF) using HALS with optional priors and regularizations.

This solver decomposes a non-negative matrix X into:

X ≈ W @ H

where W (m × rank) contains sample scores / concentrations and H (rank × n) contains component spectra/features. It is designed for EEM decomposition where X can be an unfolded EEM stack and where components may be constrained by priors or by a paired-sample ratio constraint.

Supported options include: - Quadratic priors on W and/or H (NaNs skip entries). - Elastic-net regularization on W and/or H. - A paired-row ratio penalty on W: W[idx_top] ≈ beta * W[idx_bot] with weight lam. - A hybrid model where selected components are forced to be rank-1 outer products in EEM space (CP-like): H[k] reshaped to (n_ex, n_em) is approximated by outer(B[:,k], C[:,k]). This is controlled via fit_rank_one and requires component_shape.

Parameters:

Name Type Description Default
X np.ndarray, shape (m, n)

Non-negative data matrix (e.g., unfolded EEM stack).

required
rank int

Number of components.

required
prior_dict_H dict[int, array-like]

Priors for rows of H (length n). NaNs skip entries.

None
prior_dict_W dict[int, array-like]

Priors for columns of W (length m). NaNs skip entries.

None
prior_dict_A dict[int, array-like]

Priors for CP sample-mode factor A when using rank-1 components.

None
prior_dict_B dict[int, array-like]

Priors for CP excitation factor B when using rank-1 components.

None
prior_dict_C dict[int, array-like]

Priors for CP emission factor C when using rank-1 components.

None
ref_components dict[int, array-like]

Reference components used to permute the solution for consistent ordering.

None
gamma_W float or dict[int, float]

Prior weight(s) for W.

0
gamma_H float or dict[int, float]

Prior weight(s) for H.

0
gamma_A float or dict[int, float]

Prior weight(s) for CP sample-mode factor A in the rank-1 part.

0
gamma_B float or dict[int, float]

Prior weight(s) for CP excitation factor B in the rank-1 part.

0
gamma_C float or dict[int, float]

Prior weight(s) for CP emission factor C in the rank-1 part.

0
alpha_W float

Elastic-net strength for W.

0
alpha_H float

Elastic-net strength for H.

0
alpha_A float

Elastic-net strength for CP sample-mode factor A (rank-1 part).

0
alpha_B float

Elastic-net strength for CP excitation factor B (rank-1 part).

0
alpha_C float

Elastic-net strength for CP emission factor C (rank-1 part).

0
l1_ratio float

Elastic-net mixing parameter in [0, 1].

0
idx_top sequence[int]

Row indices in W for the paired "top" samples.

None
idx_bot sequence[int]

Row indices in W for the paired "bottom" samples.

None
lam float or dict[int, float]

Ratio-penalty weight(s). Enable ratio constraint when provided together with idx_top and idx_bot.

0
fit_rank_one dict[int, bool] or False

If a dict, fit_rank_one[k]=True marks component k to be modeled as a rank-1 outer product in EEM space (CP-like). If False, standard NMF is used.

False
component_shape tuple[int, int]

(n_ex, n_em) used to reshape rows of H when using rank-1 components.

None
max_iter_als int

Maximum number of outer ALS iterations.

500
max_iter_nnls int

Maximum number of HALS iterations per NNLS subproblem.

500
tol float

Relative convergence tolerance on the reconstruction error.

1e-06
eps float

Small positive constant used as a positivity floor.

1e-08
init str, {"random", "svd", "nndsvd", "nndsvda", "nndsvdar", "ordinary_nmf", "ordinary_cp", "custom"}

Initialization method.

'random'
custom_init tuple(W, H)

Custom initialization, only used when init="custom".

None
fixed_components list[int]

Indices of components to keep fixed during updates.

None
random_state int or None

Random seed.

None

Returns:

Name Type Description
W (ndarray, shape(m, rank))

Non-negative sample scores / concentrations.

H (ndarray, shape(rank, n))

Non-negative component matrix (flattened EEM components if applicable).

beta (ndarray or None, shape(rank))

Per-component ratio estimates when the ratio penalty is enabled; otherwise None.

parafac_with_prior_hals

parafac_with_prior_hals(tensor, rank, prior_dict_A=None, prior_dict_B=None, prior_dict_C=None, gamma_A=0, gamma_B=0, gamma_C=0, ref_components=None, alpha_A=0, alpha_B=0, alpha_C=0, l1_ratio=0, lam=0, idx_top=None, idx_bot=None, max_iter_als=200, max_iter_nnls=500, tol=1e-09, eps=1e-08, init='svd', custom_init=None, random_state=None, fixed_components=None, mask=None)

Non-negative PARAFAC decomposition for EEM stacks using HALS with optional priors and masking.

This solver factorizes a 3-way tensor X into non-negative PARAFAC factors:

X ≈ [[A, B, C]]

where: - A (n_samples × rank) contains sample scores / concentrations, - B (n_ex × rank) contains excitation loadings, - C (n_em × rank) contains emission loadings.

Optional features supported by this implementation: - Sparse elementwise masking. - Quadratic priors on any factor (A/B/C), with NaNs allowed to skip entries. - Elastic-net regularization on any factor (L1/L2 mix). - A ratio constraint on paired rows of A: A[idx_top] ≈ beta * A[idx_bot], controlled by lam and estimated per component (beta).

Parameters:

Name Type Description Default
tensor array-like, shape (I, J, K)

Input tensor (EEM stack). Non-negative values are expected.

required
rank int

Number of CP components.

required
prior_dict_A dict[int, array-like]

Priors for columns of A (sample-mode factor). Each prior vector must have length I. NaNs indicate entries without a prior.

None
prior_dict_B dict[int, array-like]

Priors for columns of B (excitation factor), length J.

None
prior_dict_C dict[int, array-like]

Priors for columns of C (emission factor), length K.

None
gamma_A float or dict[int, float]

Prior weight(s) for A.

0
gamma_B float or dict[int, float]

Prior weight(s) for B.

0
gamma_C float or dict[int, float]

Prior weight(s) for C.

0
ref_components dict[int, array-like]

Reference components used to permute factors for consistent component order. Each value should be a flattened (J*K,) reference spectrum (e.g., outer(B,C)).

None
alpha_A float

Elastic-net strength for A.

0
alpha_B float

Elastic-net strength for B.

0
alpha_C float

Elastic-net strength for C.

0
l1_ratio float

Elastic-net mixing parameter in [0, 1].

0
lam float or dict[int, float]

Ratio-penalty weight(s) for the sample-mode factor A. If provided together with idx_top and idx_bot, beta is estimated per component.

0
idx_top sequence[int]

Row indices in A corresponding to the "top/original" samples in paired data.

None
idx_bot sequence[int]

Row indices in A corresponding to the "bottom/perturbed" samples in paired data.

None
max_iter_als int

Maximum number of outer ALS iterations.

200
max_iter_nnls int

Maximum number of inner HALS iterations for each NNLS subproblem.

500
tol float

Relative convergence tolerance on masked reconstruction error.

1e-09
eps float

Small positive constant for numerical stability and positivity floor.

1e-08
init str, {"random", "svd", "nndsvd", "nndsvda", "nndsvdar", "ordinary_cp", "custom"}

Initialization method for factors.

'svd'
custom_init tuple(A, B, C)

Custom initialization, only used when init="custom".

None
random_state int or None

Random seed for reproducible initialization.

None
fixed_components list[int]

Component indices to keep fixed during updates (useful when seeding known spectra).

None
mask array-like, shape (I, J, K)

Elementwise mask (1 = keep / valid entry, 0 = ignore entry). If None, finite entries of tensor are treated as valid.

None

Returns:

Name Type Description
A (ndarray, shape(I, rank))

Sample-mode factor (scores / concentrations).

B (ndarray, shape(J, rank))

Excitation factor loadings.

C (ndarray, shape(K, rank))

Emission factor loadings.

beta (ndarray or None, shape(rank))

Estimated per-component ratio between paired sample rows when the ratio penalty is enabled; otherwise None.

replace_factor_with_prior

replace_factor_with_prior(factors, prior, replaced_mode, replaced_rank='best-fit', frozen_rank=None, project_prior=True, X=None, show_replaced_rank=False)

Replace a single component in a two-factor decomposition with a prior vector.

This helper is primarily used for NMF/PARAFAC initialization, where a loading is replaced by a prior vector. This is particularly useful when the user has prior knowledge of some of the factors while keeping other unknown factors initialized via standard methods. It can (i) select which component to replace by maximizing correlation with the prior ("best-fit"), (ii) directly replace a specified component, and (iii) optionally project the prior to the other factor to keep the reconstruction consistent. E.g., when replacing a loading in one mode, the corresponding loading in the other mode is updated via a non-negative least-squares projection.

Notes
  • Replacement is performed in-place on factors.
  • When project_prior=True, the method computes a residual excluding the replaced component and projects the prior onto the other mode using a non-negative least-squares-style clipping.

Parameters:

Name Type Description Default
factors list[np.ndarray]

Factor matrices for a rank-r matrix model. Expected shapes are [F0, F1] where F0.shape = (m, r) and F1.shape = (n, r).

required
prior array-like, shape (m,) or (n,)

Prior vector to insert into the selected component of factors[replaced_mode].

required
replaced_mode int

Which factor to modify: 0 or 1.

required
replaced_rank int or {"best-fit"}

Component index to replace. If "best-fit", the component with the highest Pearson correlation to prior is replaced (excluding frozen_rank).

'best-fit'
frozen_rank int or None

Component index that must not be replaced (useful when one component is fixed).

None
project_prior bool

If True, also update the complementary factor column by projecting the prior onto the residual (requires X).

True
X np.ndarray

Data matrix being factorized, shape (m, n). Required if project_prior=True.

None
show_replaced_rank bool

If True, also return the selected replaced_rank.

False

Returns:

Name Type Description
factors list[ndarray]

Updated factor matrices.

replaced_rank (int, optional)

Only returned when show_replaced_rank=True.

solve_W

solve_W(X1, H, X2=None, beta=None, reg=0.0, non_negativity=True)

Solve for the sample-score matrix W in a (possibly constrained) linear regression subproblem.

This helper solves for W in one of the following least-squares problems:

1) Standard regression (single dataset): minimize_W || X1 - W @ H ||_F^2

2) Paired / ratio-augmented regression (two datasets): minimize_W || X1 - W @ H ||_F^2 + || X2 - W @ diag(beta) @ H ||_F^2

Optionally, non-negativity can be enforced via row-wise NNLS.

Parameters:

Name Type Description Default
X1 np.ndarray, shape (m, n)

Primary data matrix.

required
H np.ndarray, shape (r, n)

Component matrix (rows are components).

required
X2 np.ndarray, shape (m, n)

Secondary data matrix used when beta is provided.

None
beta np.ndarray, shape (r,)

Per-component scaling applied to the secondary term. If None, the second term is omitted.

None
reg float

Ridge (L2) regularization strength added to the normal equations (0 disables).

0.0
non_negativity bool

If True, enforce W >= 0 using NNLS. If False, solve the unconstrained normal equations.

True

Returns:

Name Type Description
W (ndarray, shape(m, r))

Estimated sample-score / concentration matrix.

unfolded_eem_stack_initialization

unfolded_eem_stack_initialization(M, rank, method='nndsvd')

Initialize non-negative matrix factors from an unfolded EEM stack.

Given a non-negative matrix M (typically an unfolding of a 3D EEM stack), this routine produces an initialization (U, V) suitable for NMF / HALS updates.

Supported initializations include: - ordinary_nmf : U, V generated through scikit-learn NMF with NNDSVD init. - svd : absolute-value SVD-based factors (non-negative by clipping). - nndsvd, nndsvda, nndsvdar : NNDSVD variants (Boutsidis & Gallopoulos), where zeros are kept (nndsvd), filled with the mean (nndsvda), or filled with small random values (nndsvdar).

Parameters:

Name Type Description Default
M np.ndarray, shape (m, n)

Non-negative matrix to factorize (e.g., unfolded tensor).

required
rank int

Number of components (latent factors).

required
method str, {"ordinary_nmf", "svd", "nndsvd", "nndsvda", "nndsvdar"}

Initialization method. Default is "nndsvd".

'nndsvd'

Returns:

Name Type Description
U (ndarray, shape(m, rank))

Left factor initialization.

V (ndarray, shape(rank, n))

Right factor initialization.

References

[1] Boutsidis, Christos, and Efstratios Gallopoulos. "SVD based initialization: A head start for nonnegative matrix factorization." Pattern recognition 41.4 (2008): 1350-1362.

update_beta_in_hals

update_beta_in_hals(W: ndarray, idx_top, idx_bot, eps: float = 0, boundaries: tuple = (0.95, 2)) -> np.ndarray

Estimate per-component ratios between paired sample rows.

Given a factor matrix W and paired row indices (idx_top, idx_bot), this function estimates a ratio beta[j] for each component j such that:

W[idx_top, j] ≈ beta[j] * W[idx_bot, j]

The least-squares solution is computed independently per component and then clamped to a user-provided interval.

Parameters:

Name Type Description Default
W np.ndarray, shape (m, r)

Score / concentration matrix with m samples and r components.

required
idx_top sequence[int]

Row indices corresponding to the "top/original" samples.

required
idx_bot sequence[int]

Row indices corresponding to the "bottom/perturbed" samples. Must have the same length as idx_top.

required
eps float

Small constant added to the denominator to avoid division by zero.

0
boundaries tuple[float, float]

(min_beta, max_beta) bounds used to clamp each estimated ratio.

(0.95, 2)

Returns:

Name Type Description
beta (ndarray, shape(r))

Estimated ratio for each component, clamped to [min_beta, max_beta].

update_column_in_hals

update_column_in_hals(Rk, hk_norm2, beta_k, lam, k, prior_dict=None, gamma=0.0, alpha=0.0, l1_ratio=0.0, idx_top=None, idx_bot=None, eps=1e-08)

HALS-style update for one column with a paired-row ratio penalty.

This update is designed for sample-mode factors (e.g., W in NMF or A in PARAFAC) when the dataset contains paired samples (top/bottom) that are expected to follow a component-wise ratio:

w[top_i] ≈ beta_k * w[bot_i]

The subproblem solved for a single component column w_k is:

1/2 || Rk - d * w_k ||_2^2
  • (lam/2) * sum_i (w[top_i] - beta_k * w[bot_i])^2
  • (gamma/2) * ||mask * (w_k - p_k)||_2^2
  • alpha * [ l1_ratio * ||w_k||_1 + (1 - l1_ratio)/2 * ||w_k||_2^2 ]

where NaNs in the prior are ignored (masked out).

Parameters:

Name Type Description Default
Rk np.ndarray, shape (m,)

Current HALS residual for component k (with the k-th contribution restored).

required
hk_norm2 float

Squared norm of the corresponding component in the other factor(s) (i.e., the diagonal element d of the Gram matrix).

required
beta_k float

Current ratio estimate for component k.

required
lam float

Ratio penalty weight for component k.

required
k int

Component index (used to look up prior_dict[k]).

required
prior_dict dict[int, array-like]

Mapping from component index to a prior vector of length m (NaNs allowed).

None
gamma float

Prior weight.

0.0
alpha float

Elastic-net strength.

0.0
l1_ratio float

Elastic-net mixing parameter in [0, 1].

0.0
idx_top sequence[int]

Indices of "top/original" rows in the paired sample design.

None
idx_bot sequence[int]

Indices of "bottom/perturbed" rows paired with idx_top.

None
eps float

Small constant used as a positivity floor and for numerical stability.

1e-08

Returns:

Name Type Description
w_new (ndarray, shape(m))

Updated non-negative column for component k.