solver

eempy.solver.core

Solver routines for excitation–emission matrix (EEM) decomposition.

This module provides numerical solvers used to decompose an EEM stack into a small number of non-negative components using matrix (NMF) and tensor (CP/PARAFAC) factorization models.

Implemented features include: - HALS/ALS-style NNLS updates for NMF and CP factors, - quadratic (Tikhonov) priors with NaN-masked entries, - elastic-net regularization (L1/L2 mix), - optional paired-sample ratio constraints on sample-mode scores, - elementwise masking to ignore sparse invalid tensor entries.

hals_nnls

hals_nnls(UtM, UtU, prior_dict=None, gamma=None, alpha=0, l1_ratio=0, V=None, max_iter=500, tol=1e-08, eps=1e-08, fixed_components=None)

HALS-style non-negative least squares update with optional priors and elastic-net.

This routine updates a factor matrix V (with shape (rank, n_features)) in a least-squares subproblem in NMF and PARAFAC:

min_{V >= 0}  1/2 || UtM - UtU @ V ||_F^2
              + sum_k (gamma_k/2) ||mask_k * (V_k - p_k)||_2^2
              + alpha * [ l1_ratio * ||V||_1 + (1 - l1_ratio)/2 * ||V||_F^2 ]

where: - UtM corresponds to UᵀM (or a mode-wise MTTKRP in PARAFAC), - UtU corresponds to UᵀU (Gram matrix), - prior_dict[k] = p_k is an optional per-component prior vector for row k, with NaNs allowed to indicate "no prior" at specific entries, - gamma can be a scalar applied to all prior components or a dict {k: gamma_k} to specify gamma values for different priors.

The update is performed row-by-row (FastHALS residual update), and can keep selected components fixed (useful when seeding or freezing known spectra).

Parameters:

Name	Type	Description	Default
`UtM`	`np.ndarray, shape (rank, n)`	Cross-product term (e.g., UᵀM or MTTKRPᵀ), with one row per component.	required
`UtU`	`np.ndarray, shape (rank, rank)`	Gram matrix term (e.g., UᵀU).	required
`prior_dict`	`dict[int, array-like]`	Mapping from component index `k` to a prior vector of length `n`. NaNs are treated as "no prior" for those entries.	`None`
`gamma`	`float or dict[int, float]`	Prior strength. If a float is provided, it is applied to all keys in `prior_dict`. If a dict is provided, strengths are component-specific.	`None`
`alpha`	`float`	Overall elastic-net strength (0 disables elastic-net).	`0`
`l1_ratio`	`float`	Elastic-net mixing parameter in [0, 1]. 0 = pure L2, 1 = pure L1.	`0`
`V`	`np.ndarray, shape (rank, n)`	Initial value for `V`. If None, a least-squares solution is computed and clipped to be positive.	`None`
`max_iter`	`int`	Maximum number of HALS iterations.	`500`
`tol`	`float`	Relative convergence tolerance based on HALS update magnitude.	`1e-08`
`eps`	`float`	Small positive constant used for numerical stability and positivity floor.	`1e-08`
`fixed_components`	`list[int]`	Indices of components (rows of `V`) that should not be updated.	`None`

Returns:

Name	Type	Description
`V`	`(ndarray, shape(rank, n))`	Updated non-negative factor matrix.

nmf_with_prior_hals

nmf_with_prior_hals(X, rank, prior_dict_H=None, prior_dict_W=None, prior_dict_A=None, prior_dict_B=None, prior_dict_C=None, ref_components=None, gamma_W=0, gamma_H=0, gamma_A=0, gamma_B=0, gamma_C=0, alpha_W=0, alpha_H=0, alpha_A=0, alpha_B=0, alpha_C=0, l1_ratio=0, idx_top=None, idx_bot=None, lam=0, fit_rank_one=False, component_shape=None, max_iter_als=500, max_iter_nnls=500, tol=1e-06, eps=1e-08, init='random', custom_init=None, fixed_components=None, random_state=None)

Non-negative matrix factorization (NMF) using HALS with optional priors and regularizations.

This solver decomposes a non-negative matrix X into:

X ≈ W @ H

where W (m × rank) contains sample scores / concentrations and H (rank × n) contains component spectra/features. It is designed for EEM decomposition where X can be an unfolded EEM stack and where components may be constrained by priors or by a paired-sample ratio constraint.

Supported options include: - Quadratic priors on W and/or H (NaNs skip entries). - Elastic-net regularization on W and/or H. - A paired-row ratio penalty on W: W[idx_top] ≈ beta * W[idx_bot] with weight lam. - A hybrid model where selected components are forced to be rank-1 outer products in EEM space (CP-like): H[k] reshaped to (n_ex, n_em) is approximated by outer(B[:,k], C[:,k]). This is controlled via fit_rank_one and requires component_shape.

Parameters:

Name	Type	Description	Default
`X`	`np.ndarray, shape (m, n)`	Non-negative data matrix (e.g., unfolded EEM stack).	required
`rank`	`int`	Number of components.	required
`prior_dict_H`	`dict[int, array-like]`	Priors for rows of `H` (length `n`). NaNs skip entries.	`None`
`prior_dict_W`	`dict[int, array-like]`	Priors for columns of `W` (length `m`). NaNs skip entries.	`None`
`prior_dict_A`	`dict[int, array-like]`	Priors for CP sample-mode factor `A` when using rank-1 components.	`None`
`prior_dict_B`	`dict[int, array-like]`	Priors for CP excitation factor `B` when using rank-1 components.	`None`
`prior_dict_C`	`dict[int, array-like]`	Priors for CP emission factor `C` when using rank-1 components.	`None`
`ref_components`	`dict[int, array-like]`	Reference components used to permute the solution for consistent ordering.	`None`
`gamma_W`	`float or dict[int, float]`	Prior weight(s) for `W`.	`0`
`gamma_H`	`float or dict[int, float]`	Prior weight(s) for `H`.	`0`
`gamma_A`	`float or dict[int, float]`	Prior weight(s) for CP sample-mode factor `A` in the rank-1 part.	`0`
`gamma_B`	`float or dict[int, float]`	Prior weight(s) for CP excitation factor `B` in the rank-1 part.	`0`
`gamma_C`	`float or dict[int, float]`	Prior weight(s) for CP emission factor `C` in the rank-1 part.	`0`
`alpha_W`	`float`	Elastic-net strength for `W`.	`0`
`alpha_H`	`float`	Elastic-net strength for `H`.	`0`
`alpha_A`	`float`	Elastic-net strength for CP sample-mode factor `A` (rank-1 part).	`0`
`alpha_B`	`float`	Elastic-net strength for CP excitation factor `B` (rank-1 part).	`0`
`alpha_C`	`float`	Elastic-net strength for CP emission factor `C` (rank-1 part).	`0`
`l1_ratio`	`float`	Elastic-net mixing parameter in [0, 1].	`0`
`idx_top`	`sequence[int]`	Row indices in `W` for the paired "top" samples.	`None`
`idx_bot`	`sequence[int]`	Row indices in `W` for the paired "bottom" samples.	`None`
`lam`	`float or dict[int, float]`	Ratio-penalty weight(s). Enable ratio constraint when provided together with `idx_top` and `idx_bot`.	`0`
`fit_rank_one`	`dict[int, bool] or False`	If a dict, `fit_rank_one[k]=True` marks component `k` to be modeled as a rank-1 outer product in EEM space (CP-like). If False, standard NMF is used.	`False`
`component_shape`	`tuple[int, int]`	`(n_ex, n_em)` used to reshape rows of `H` when using rank-1 components.	`None`
`max_iter_als`	`int`	Maximum number of outer ALS iterations.	`500`
`max_iter_nnls`	`int`	Maximum number of HALS iterations per NNLS subproblem.	`500`
`tol`	`float`	Relative convergence tolerance on the reconstruction error.	`1e-06`
`eps`	`float`	Small positive constant used as a positivity floor.	`1e-08`
`init`	`str, {"random", "svd", "nndsvd", "nndsvda", "nndsvdar", "ordinary_nmf", "ordinary_cp", "custom"}`	Initialization method.	`'random'`
`custom_init`	`tuple(W, H)`	Custom initialization, only used when `init="custom"`.	`None`
`fixed_components`	`list[int]`	Indices of components to keep fixed during updates.	`None`
`random_state`	`int or None`	Random seed.	`None`

Returns:

Name	Type	Description
`W`	`(ndarray, shape(m, rank))`	Non-negative sample scores / concentrations.
`H`	`(ndarray, shape(rank, n))`	Non-negative component matrix (flattened EEM components if applicable).
`beta`	`(ndarray or None, shape(rank))`	Per-component ratio estimates when the ratio penalty is enabled; otherwise None.

parafac_with_prior_hals

parafac_with_prior_hals(tensor, rank, prior_dict_A=None, prior_dict_B=None, prior_dict_C=None, gamma_A=0, gamma_B=0, gamma_C=0, ref_components=None, alpha_A=0, alpha_B=0, alpha_C=0, l1_ratio=0, lam=0, idx_top=None, idx_bot=None, max_iter_als=200, max_iter_nnls=500, tol=1e-09, eps=1e-08, init='svd', custom_init=None, random_state=None, fixed_components=None, mask=None)

Non-negative PARAFAC decomposition for EEM stacks using HALS with optional priors and masking.

This solver factorizes a 3-way tensor X into non-negative PARAFAC factors:

X ≈ [[A, B, C]]

where: - A (n_samples × rank) contains sample scores / concentrations, - B (n_ex × rank) contains excitation loadings, - C (n_em × rank) contains emission loadings.

Optional features supported by this implementation: - Sparse elementwise masking. - Quadratic priors on any factor (A/B/C), with NaNs allowed to skip entries. - Elastic-net regularization on any factor (L1/L2 mix). - A ratio constraint on paired rows of A: A[idx_top] ≈ beta * A[idx_bot], controlled by lam and estimated per component (beta).

Parameters:

Name	Type	Description	Default
`tensor`	`array-like, shape (I, J, K)`	Input tensor (EEM stack). Non-negative values are expected.	required
`rank`	`int`	Number of CP components.	required
`prior_dict_A`	`dict[int, array-like]`	Priors for columns of `A` (sample-mode factor). Each prior vector must have length `I`. NaNs indicate entries without a prior.	`None`
`prior_dict_B`	`dict[int, array-like]`	Priors for columns of `B` (excitation factor), length `J`.	`None`
`prior_dict_C`	`dict[int, array-like]`	Priors for columns of `C` (emission factor), length `K`.	`None`
`gamma_A`	`float or dict[int, float]`	Prior weight(s) for `A`.	`0`
`gamma_B`	`float or dict[int, float]`	Prior weight(s) for `B`.	`0`
`gamma_C`	`float or dict[int, float]`	Prior weight(s) for `C`.	`0`
`ref_components`	`dict[int, array-like]`	Reference components used to permute factors for consistent component order. Each value should be a flattened (J*K,) reference spectrum (e.g., outer(B,C)).	`None`
`alpha_A`	`float`	Elastic-net strength for `A`.	`0`
`alpha_B`	`float`	Elastic-net strength for `B`.	`0`
`alpha_C`	`float`	Elastic-net strength for `C`.	`0`
`l1_ratio`	`float`	Elastic-net mixing parameter in [0, 1].	`0`
`lam`	`float or dict[int, float]`	Ratio-penalty weight(s) for the sample-mode factor `A`. If provided together with `idx_top` and `idx_bot`, `beta` is estimated per component.	`0`
`idx_top`	`sequence[int]`	Row indices in `A` corresponding to the "top/original" samples in paired data.	`None`
`idx_bot`	`sequence[int]`	Row indices in `A` corresponding to the "bottom/perturbed" samples in paired data.	`None`
`max_iter_als`	`int`	Maximum number of outer ALS iterations.	`200`
`max_iter_nnls`	`int`	Maximum number of inner HALS iterations for each NNLS subproblem.	`500`
`tol`	`float`	Relative convergence tolerance on masked reconstruction error.	`1e-09`
`eps`	`float`	Small positive constant for numerical stability and positivity floor.	`1e-08`
`init`	`str, {"random", "svd", "nndsvd", "nndsvda", "nndsvdar", "ordinary_cp", "custom"}`	Initialization method for factors.	`'svd'`
`custom_init`	`tuple(A, B, C)`	Custom initialization, only used when `init="custom"`.	`None`
`random_state`	`int or None`	Random seed for reproducible initialization.	`None`
`fixed_components`	`list[int]`	Component indices to keep fixed during updates (useful when seeding known spectra).	`None`
`mask`	`array-like, shape (I, J, K)`	Elementwise mask (1 = keep / valid entry, 0 = ignore entry). If None, finite entries of `tensor` are treated as valid.	`None`

Returns:

Name	Type	Description
`A`	`(ndarray, shape(I, rank))`	Sample-mode factor (scores / concentrations).
`B`	`(ndarray, shape(J, rank))`	Excitation factor loadings.
`C`	`(ndarray, shape(K, rank))`	Emission factor loadings.
`beta`	`(ndarray or None, shape(rank))`	Estimated per-component ratio between paired sample rows when the ratio penalty is enabled; otherwise None.

replace_factor_with_prior

replace_factor_with_prior(factors, prior, replaced_mode, replaced_rank='best-fit', frozen_rank=None, project_prior=True, X=None, show_replaced_rank=False)

Replace a single component in a two-factor decomposition with a prior vector.

This helper is primarily used for NMF/PARAFAC initialization, where a loading is replaced by a prior vector. This is particularly useful when the user has prior knowledge of some of the factors while keeping other unknown factors initialized via standard methods. It can (i) select which component to replace by maximizing correlation with the prior ("best-fit"), (ii) directly replace a specified component, and (iii) optionally project the prior to the other factor to keep the reconstruction consistent. E.g., when replacing a loading in one mode, the corresponding loading in the other mode is updated via a non-negative least-squares projection.

Notes

Replacement is performed in-place on factors.
When project_prior=True, the method computes a residual excluding the replaced component and projects the prior onto the other mode using a non-negative least-squares-style clipping.

Parameters:

Name	Type	Description	Default
`factors`	`list[np.ndarray]`	Factor matrices for a rank-`r` matrix model. Expected shapes are `[F0, F1]` where `F0.shape = (m, r)` and `F1.shape = (n, r)`.	required
`prior`	`array-like, shape (m,) or (n,)`	Prior vector to insert into the selected component of `factors[replaced_mode]`.	required
`replaced_mode`	`int`	Which factor to modify: 0 or 1.	required
`replaced_rank`	`int or {"best-fit"}`	Component index to replace. If "best-fit", the component with the highest Pearson correlation to `prior` is replaced (excluding `frozen_rank`).	`'best-fit'`
`frozen_rank`	`int or None`	Component index that must not be replaced (useful when one component is fixed).	`None`
`project_prior`	`bool`	If True, also update the complementary factor column by projecting the prior onto the residual (requires `X`).	`True`
`X`	`np.ndarray`	Data matrix being factorized, shape (m, n). Required if `project_prior=True`.	`None`
`show_replaced_rank`	`bool`	If True, also return the selected `replaced_rank`.	`False`

Returns:

Name	Type	Description
`factors`	`list[ndarray]`	Updated factor matrices.
`replaced_rank`	`(int, optional)`	Only returned when `show_replaced_rank=True`.

solve_W

solve_W(X1, H, X2=None, beta=None, reg=0.0, non_negativity=True)

Solve for the sample-score matrix W in a (possibly constrained) linear regression subproblem.

This helper solves for W in one of the following least-squares problems:

1) Standard regression (single dataset): minimize_W || X1 - W @ H ||_F^2

2) Paired / ratio-augmented regression (two datasets): minimize_W || X1 - W @ H ||_F^2 + || X2 - W @ diag(beta) @ H ||_F^2

Optionally, non-negativity can be enforced via row-wise NNLS.

Parameters:

Name	Type	Description	Default
`X1`	`np.ndarray, shape (m, n)`	Primary data matrix.	required
`H`	`np.ndarray, shape (r, n)`	Component matrix (rows are components).	required
`X2`	`np.ndarray, shape (m, n)`	Secondary data matrix used when `beta` is provided.	`None`
`beta`	`np.ndarray, shape (r,)`	Per-component scaling applied to the secondary term. If None, the second term is omitted.	`None`
`reg`	`float`	Ridge (L2) regularization strength added to the normal equations (0 disables).	`0.0`
`non_negativity`	`bool`	If True, enforce `W >= 0` using NNLS. If False, solve the unconstrained normal equations.	`True`

Returns:

Name	Type	Description
`W`	`(ndarray, shape(m, r))`	Estimated sample-score / concentration matrix.

unfolded_eem_stack_initialization

unfolded_eem_stack_initialization(M, rank, method='nndsvd')

Initialize non-negative matrix factors from an unfolded EEM stack.

Given a non-negative matrix M (typically an unfolding of a 3D EEM stack), this routine produces an initialization (U, V) suitable for NMF / HALS updates.

Supported initializations include: - ordinary_nmf : U, V generated through scikit-learn NMF with NNDSVD init. - svd : absolute-value SVD-based factors (non-negative by clipping). - nndsvd, nndsvda, nndsvdar : NNDSVD variants (Boutsidis & Gallopoulos), where zeros are kept (nndsvd), filled with the mean (nndsvda), or filled with small random values (nndsvdar).

Parameters:

Name	Type	Description	Default
`M`	`np.ndarray, shape (m, n)`	Non-negative matrix to factorize (e.g., unfolded tensor).	required
`rank`	`int`	Number of components (latent factors).	required
`method`	`str, {"ordinary_nmf", "svd", "nndsvd", "nndsvda", "nndsvdar"}`	Initialization method. Default is "nndsvd".	`'nndsvd'`

Returns:

Name	Type	Description
`U`	`(ndarray, shape(m, rank))`	Left factor initialization.
`V`	`(ndarray, shape(rank, n))`	Right factor initialization.

References

[1] Boutsidis, Christos, and Efstratios Gallopoulos. "SVD based initialization: A head start for nonnegative matrix factorization." Pattern recognition 41.4 (2008): 1350-1362.

update_beta_in_hals

update_beta_in_hals(W: ndarray, idx_top, idx_bot, eps: float = 0, boundaries: tuple = (0.95, 2)) -> np.ndarray

Estimate per-component ratios between paired sample rows.

Given a factor matrix W and paired row indices (idx_top, idx_bot), this function estimates a ratio beta[j] for each component j such that:

W[idx_top, j] ≈ beta[j] * W[idx_bot, j]

The least-squares solution is computed independently per component and then clamped to a user-provided interval.

Parameters:

Name	Type	Description	Default
`W`	`np.ndarray, shape (m, r)`	Score / concentration matrix with `m` samples and `r` components.	required
`idx_top`	`sequence[int]`	Row indices corresponding to the "top/original" samples.	required
`idx_bot`	`sequence[int]`	Row indices corresponding to the "bottom/perturbed" samples. Must have the same length as `idx_top`.	required
`eps`	`float`	Small constant added to the denominator to avoid division by zero.	`0`
`boundaries`	`tuple[float, float]`	(min_beta, max_beta) bounds used to clamp each estimated ratio.	`(0.95, 2)`

Returns:

Name	Type	Description
`beta`	`(ndarray, shape(r))`	Estimated ratio for each component, clamped to `[min_beta, max_beta]`.

update_column_in_hals

update_column_in_hals(Rk, hk_norm2, beta_k, lam, k, prior_dict=None, gamma=0.0, alpha=0.0, l1_ratio=0.0, idx_top=None, idx_bot=None, eps=1e-08)

HALS-style update for one column with a paired-row ratio penalty.

This update is designed for sample-mode factors (e.g., W in NMF or A in PARAFAC) when the dataset contains paired samples (top/bottom) that are expected to follow a component-wise ratio:

w[top_i] ≈ beta_k * w[bot_i]

The subproblem solved for a single component column w_k is:

1/2 || Rk - d * w_k ||_2^2

(lam/2) * sum_i (w[top_i] - beta_k * w[bot_i])^2
(gamma/2) * ||mask * (w_k - p_k)||_2^2
alpha * [ l1_ratio * ||w_k||_1 + (1 - l1_ratio)/2 * ||w_k||_2^2 ]

where NaNs in the prior are ignored (masked out).

Parameters:

Name	Type	Description	Default
`Rk`	`np.ndarray, shape (m,)`	Current HALS residual for component `k` (with the k-th contribution restored).	required
`hk_norm2`	`float`	Squared norm of the corresponding component in the other factor(s) (i.e., the diagonal element `d` of the Gram matrix).	required
`beta_k`	`float`	Current ratio estimate for component `k`.	required
`lam`	`float`	Ratio penalty weight for component `k`.	required
`k`	`int`	Component index (used to look up `prior_dict[k]`).	required
`prior_dict`	`dict[int, array-like]`	Mapping from component index to a prior vector of length `m` (NaNs allowed).	`None`
`gamma`	`float`	Prior weight.	`0.0`
`alpha`	`float`	Elastic-net strength.	`0.0`
`l1_ratio`	`float`	Elastic-net mixing parameter in [0, 1].	`0.0`
`idx_top`	`sequence[int]`	Indices of "top/original" rows in the paired sample design.	`None`
`idx_bot`	`sequence[int]`	Indices of "bottom/perturbed" rows paired with `idx_top`.	`None`
`eps`	`float`	Small constant used as a positivity floor and for numerical stability.	`1e-08`

Returns:

Name	Type	Description
`w_new`	`(ndarray, shape(m))`	Updated non-negative column for component `k`.