eem_processing

eempy.eem_processing

EEMDataset

Build an EEM dataset.

Parameters:

Name	Type	Description	Default
`eem_stack`	`np.ndarray`	The 3D EEM stack, with shape (n_samples, n_ex_wavelengths, n_em_wavelengths).	required
`ex_range`	`np.ndarray`	A 1D NumPy array of the excitation wavelengths.	required
`em_range`	`np.ndarray`	A 1D NumPy array of the emission wavelengths.	required
`index`	`list or None`	Optional. The name used to label each sample. The number of elements in the list should equal the number of samples in the eem_stack (with the same sample order).	`None`
`ref`	`pd.DataFrame or None`	Optional. The reference data, e.g., the contaminant concentrations in each sample. It should have a length equal to the number of samples in the eem_stack. The index of each sample should be the name given in parameter "index". It is possible to have more than one column. NaN is allowed (for example, if contaminant concentrations in specific samples are unknown).	`None`
`cluster`	`list or None`	Optional. The classification of samples, e.g., the output of EEM clustering algorithms. The number of elements in the list should equal the number of samples in the eem_stack (with the same sample order).	`None`

aqy

aqy(abs_stack, ex_range_abs, target_ex=None)

Calculate the apparent_quantum_yield (AQY).

Parameters:

Name	Type	Description	Default
`abs_stack`	`ndarray`	absorbance spectra stack	required
`ex_range_abs`	`ndarray`	excitation wavelengths of absorbance spectra	required
`target_ex`	`float or None`	excitation wavelength for AQY. If None is passed, all excitation wavelengths will be returned.	`None`

Returns:

Name	Type	Description
`aqy`	`DataFrame`	apparent quantum yield (AQY)

bix

bix()

Calculate the biological index (BIX).

Returns:

Name	Type	Description
`bix`	`DataFrame`	BIX

correlation

correlation(variables, fit_intercept=True)

Analyze the correlation between reference and fluorescence intensity at each pair of ex/em.

Parameters:

Name	Type	Description	Default
`variables`	`list`	List of variables (i.e., the headers of the reference table) to be fitted	required
`fit_intercept`	`bool`	Whether to fit the intercept for linear regression.	`True`

Returns:

Name	Type	Description
`corr_dict`	`dict`	A dictionary containing multiple correlation evaluation metrics.

cutting

cutting(ex_min, ex_max, em_min, em_max, inplace=True)

Cut every EEM in the dataset to a new excitation/emission window.

Parameters:

Name	Type	Description	Default
`ex_min`	`float`	Lower bound of the excitation wavelength window to keep (nm).	required
`ex_max`	`float`	Upper bound of the excitation wavelength window to keep (nm).	required
`em_min`	`float`	Lower bound of the emission wavelength window to keep (nm).	required
`em_max`	`float`	Upper bound of the emission wavelength window to keep (nm).	required
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	EEM dataset after cutting. The dataset's `ex_range` and `em_range` are updated accordingly.

fi

fi()

Compute the fluorescence index (FI) for each sample.

FI is computed as intensity(ex=370 nm, em=470 nm) divided by intensity(ex=370 nm, em=520 nm).

Returns:

Name	Type	Description
`fi`	`DataFrame`	Fluorescence index values. Note: the current implementation labels the output column as "BIX" even though the values correspond to FI.

filter_by_cluster

filter_by_cluster(cluster_names, inplace=True)

Select the samples belong to certain cluster(s).

Parameters:

Name	Type	Description	Default
`cluster_names`	`int/float/str or list of int/float/str`	cluster names.	required
`inplace`	`bool`	if False, overwrite the EEMDataset object.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	The filtered EEM dataset.

filter_by_index

filter_by_index(mandatory_keywords, optional_keywords, inplace=True)

Select the samples whose indexes contain the given keyword.

Parameters:

Name	Type	Description	Default
`mandatory_keywords`	`str or list of str`	Keywords for selecting samples whose indexes contain all the mandatory keywords.	required
`optional_keywords`	`str or list of str`	Keywords for selecting samples whose indexes contain any of the optional keywords.	required
`inplace`	`bool`	if True, overwrite the EEMDataset object.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	The filtered EEM dataset.

gaussian_filter

gaussian_filter(sigma=1, truncate=3, inplace=True)

Apply Gaussian filtering to every EEM in the dataset.

Parameters:

Name	Type	Description	Default
`sigma`	`float`	Standard deviation of the Gaussian kernel.	`1`
`truncate`	`float`	Truncate the filter at this many standard deviations.	`3`
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	EEM dataset with Gaussian filtering applied.

hix

hix()

Calculate the humification index (HIX).

Returns:

Name	Type	Description
`hix`	`DataFrame`	HIX

ife_correction

ife_correction(absorbance, ex_range_abs, inplace=True)

Apply inner filter effect (IFE) correction to every EEM using absorbance spectra.

Parameters:

Name	Type	Description	Default
`absorbance`	`ndarray`	Absorbance spectra stack (n_samples, n_abs_wavelengths).	required
`ex_range_abs`	`ndarray`	Wavelength axis (nm) for the absorbance spectra.	required
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	IFE-corrected EEM dataset.

interpolation

interpolation(ex_range_new, em_range_new, method, inplace=True)

Interpolate every EEM onto a new excitation/emission wavelength grid.

Parameters:

Name	Type	Description	Default
`ex_range_new`	`ndarray`	Target excitation wavelength axis (nm).	required
`em_range_new`	`ndarray`	Target emission wavelength axis (nm).	required
`method`	`(str, {linear, nearest, slinear, cubic, quintic})`	Interpolation method passed to `scipy.interpolate.RegularGridInterpolator`.	required
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	EEM dataset interpolated to the new wavelength grid. The dataset's `ex_range` and `em_range` are updated accordingly.

mean

mean()

Calculate mean of each pixel over all samples.

Returns:

Name	Type	Description
`mean`	`ndarray`

median_filter

median_filter(window_size=(3, 3), mode='reflect', inplace=True)

Apply median filtering to an EEM.

Parameters:

Name	Type	Description	Default
`window_size`	`tuple of two integers`	Gives the shape that is taken from the input array, at every element position, to define the input to the filter function.	`(3, 3)`
`mode`	`str, {‘reflect’, ‘constant’, ‘nearest’, ‘mirror’, ‘wrap’}`	The mode parameter determines how the input array is extended beyond its boundaries.	`'reflect'`
`inplace`	`bool`	if True, overwrite the EEMDataset object with the processed EEMs.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	The processed EEM dataset.

nan_imputing

nan_imputing(method='linear', fill_value='linear_ex', inplace=True)

Impute NaN pixels in every EEM in the dataset.

Parameters:

Name	Type	Description	Default
`method`	`(str, {linear, cubic})`	2D interpolation method passed to `scipy.interpolate.griddata`.	`"linear"`
`fill_value`	`(float or str, {linear_ex, linear_em})`	How to fill pixels outside the convex hull of non-NaN data.	`"linear_ex"`
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	EEM dataset with NaN pixels filled.

peak_picking

peak_picking(ex, em)

Return the fluorescence intensities at the location closest to the given (ex, em).

Parameters:

Name	Type	Description	Default
`ex`	`float or int`	excitation wavelength of the wanted location	required
`em`	`float or int`	emission wavelength of the wanted location	required

Returns:

Name	Type	Description
`fi`	`DataFrame`	table of fluorescence intensities at the wanted location for all samples
`ex_actual`		the actual ex of the extracted fluorescence intensities
`em_actual`		the actual em of the extracted fluorescence intensities

raman_normalization

raman_normalization(ex_range_blank=None, em_range_blank=None, blank=None, from_blank=False, integration_time=1, ex_target=350, bandwidth=5, rsu_standard=20000, manual_rsu=1, inplace=True)

Normalize every EEM in the dataset by a Raman scattering unit (RSU). RSU can be supplied directly ( from_blank=False) or calculated from blank EEM data (from_blank=True). The normalization factor is RSU_raw divided by (rsu_standard * integration_time).

Parameters:

Name	Type	Description	Default
`blank`	`ndarray`	Blank EEM(s) used to estimate RSU when `from_blank=True`.	`None`
`ex_range_blank`	`ndarray`	Excitation wavelength axis for the blank EEM(s).	`None`
`em_range_blank`	`ndarray`	Emission wavelength axis for the blank EEM(s).	`None`
`from_blank`	`bool`	If True, calculate RSU from the provided blank EEM(s).	`False`
`integration_time`	`float`	Integration time used for the blank measurement.	`1`
`ex_target`	`float`	Excitation wavelength (nm) at which RSU is computed.	`350`
`bandwidth`	`float`	Raman peak bandwidth (nm) used for regional integration.	`5`
`rsu_standard`	`float`	Scaling factor applied to RSU to control the magnitude of normalized intensities.	`20000`
`manual_rsu`	`float`	RSU used directly when `from_blank=False`.	`1`
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	Raman-normalized EEM dataset.

raman_scattering_removal

raman_scattering_removal(width=5, interpolation_method='linear', interpolation_dimension='2d', inplace=True, recover_original_nan=True)

Remove the first-order Raman scattering band and fill the masked region.

Parameters:

Name	Type	Description	Default
`width`	`float`	Total width (nm) of the Raman scattering band to mask.	`5`
`interpolation_method`	`(str, {linear, cubic, nan, zero})`	Method used to fill the masked region.	`"linear"`
`interpolation_dimension`	`(str, {'1d-ex', '1d-em', '2d'})`	Interpolation axis/dimension used when `interpolation_method` is not "nan" or "zero".	`"2d"`
`recover_original_nan`	`bool`	If True, preserve NaN pixels that existed before scattering removal.	`True`
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	EEM dataset with Raman scattering removed and filled.

rayleigh_scattering_removal

rayleigh_scattering_removal(width_o1=15, width_o2=15, interpolation_dimension_o1='2d', interpolation_dimension_o2='2d', interpolation_method_o1='zero', interpolation_method_o2='linear', inplace=True, recover_original_nan=True)

Remove first- and second-order Rayleigh scattering bands and fill the masked regions.

Parameters:

Name	Type	Description	Default
`width_o1`	`float`	Total width (nm) of the first-order Rayleigh band (Em = Ex).	`15`
`width_o2`	`float`	Total width (nm) of the second-order Rayleigh band (Em = 2*Ex).	`15`
`interpolation_dimension_o1`	`(str, {'1d-ex', '1d-em', '2d'})`	Interpolation axis/dimension for the first-order band.	`"2d"`
`interpolation_dimension_o2`	`(str, {'1d-ex', '1d-em', '2d'})`	Interpolation axis/dimension for the second-order band.	`"2d"`
`interpolation_method_o1`	`(str, {linear, cubic, nan, zero, none})`	Fill method for the first-order band.	`"zero"`
`interpolation_method_o2`	`(str, {linear, cubic, nan, zero, none})`	Fill method for the second-order band.	`"linear"`
`recover_original_nan`	`bool`	If True, preserve NaN pixels that existed before scattering removal.	`True`
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	EEM dataset with Rayleigh scattering removed and filled.

region_masking

region_masking(ex_min, ex_max, em_min, em_max, fill_value='nan', inplace=True)

Mask a rectangular excitation/emission region in every EEM in the dataset.

Parameters:

Name	Type	Description	Default
`ex_min`	`float`	Lower bound of the excitation wavelength window to mask (nm).	`230`
`ex_max`	`float`	Upper bound of the excitation wavelength window to mask (nm).	`500`
`em_min`	`float`	Lower bound of the emission wavelength window to mask (nm).	`250`
`em_max`	`float`	Upper bound of the emission wavelength window to mask (nm).	`810`
`fill_value`	`(str, {nan, zero})`	How to fill the masked region.	`"nan"`
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	EEM dataset with regional masking applied.

regional_integration

regional_integration(ex_min, ex_max, em_min, em_max) -> pd.DataFrame

Calculate regional integration of samples.

Parameters:

Name	Type	Description	Default
`ex_min`	`float`	The lower boundary of excitation wavelengths of the integrated region.	required
`ex_max`	`float`	The upper boundary of excitation wavelengths of the integrated region.	required
`em_min`	`float`	The lower boundary of emission wavelengths of the integrated region.	required
`em_max`	`float`	The upper boundary of emission wavelengths of the integrated region.	required

Returns:

Name	Type	Description
`integrations`	`DataFrame`

sort_by_index

sort_by_index(inplace=True)

Sort the sample order of eem_stack, index and reference (if exists) by the index.

Parameters:

Name	Type	Description	Default
`inplace`	`bool`	If True, overwrite the EEMDataset object.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	The processed EEM dataset.

splitting

splitting(n_split, rule: str = 'random', random_state=None, kw_top=None, kw_bot=None, idx_top=None, idx_bot=None)

To split the EEM dataset and form multiple sub-datasets.

Parameters:

Name	Type	Description	Default
`n_split`	`int`	The number of splits.	required
`rule`	`(str, {random, sequential})`	If 'random' is passed, the split will be generated randomly. If 'sequential' is passed, the dataset will be split according to index order.	`'random'`
`random_state`	`int`	Random seed for splitting.	`None`

Returns:

Name	Type	Description
`model_list`	`list.`	A list of sub-datasets. Each of them is an EEMDataset object.

std

std()

Calculate standard deviation of each pixel over all samples.

Returns:

Name	Type	Description
`std`	`ndarray`

subsampling

subsampling(portion=0.8, inplace=True)

Randomly select a portion of the EEM.

Parameters:

Name	Type	Description	Default
`portion`	`float`	The portion.	`0.8`
`inplace`	`bool`	if True, overwrite the EEMDataset object.	`True`

Returns:

Name	Type	Description
`eem_dataset_sub`	`ndarray`	New EEM dataset.
`selected_indices`	`ndarray`	Indices of selected EEMs.

tf_normalization

tf_normalization(inplace=True)

Normalize every EEM by its total fluorescence. Each sample is divided by its total fluorescence, normalized to the mean total fluorescence across the dataset.

Parameters:

Name	Type	Description	Default
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	Total-fluorescence-normalized EEM dataset.
`weights`	`ndarray`	Per-sample normalization factors (total fluorescence divided by the dataset mean).

threshold_masking

threshold_masking(threshold, fill, mask_type='greater', inplace=True)

Mask fluorescence intensity values above or below a threshold across all samples.

Parameters:

Name	Type	Description	Default
`threshold`	`float or int`	Intensity threshold.	required
`fill`	`float or int`	Value used to replace masked pixels.	required
`mask_type`	`(str, {greater, smaller})`	Whether to mask values greater than or smaller than `threshold`.	`"greater"`
`inplace`	`bool`	If True, overwrite `self` and return it. If False, return a new EEMDataset instance.	`True`

Returns:

Name	Type	Description
`eem_dataset_new`	`EEMDataset`	EEM dataset with threshold masking applied.

total_fluorescence

total_fluorescence()

Calculate total fluorescence of each sample.

Returns:

Name	Type	Description
`tf`	`ndarray`

variance

variance()

Calculate variance of each pixel over all samples.

Returns:

Name	Type	Description
`variance`	`ndarray`

zscore

zscore()

Calculate zscore of each pixel over all samples.

Returns:

Name	Type	Description
`zscore`	`ndarray`

PARAFAC

Parallel factor analysis (PARAFAC) model for an excitation–emission matrix (EEM) dataset.

This class fits a low-rank PARAFAC (CP) decomposition to a 3D EEM stack with shape (n_samples, n_ex, n_em) by factorizing it into: - A sample-mode score matrix A with shape (n_samples, n_components). - An excitation-mode loading matrix B with shape (n_ex, n_components). - An emission-mode loading matrix C with shape (n_em, n_components).

Each component r corresponds to a rank-1 outer product A[:, r] ⊗ B[:, r] ⊗ C[:, r], and the reconstructed EEM stack is obtained by summing these rank-1 components over r = 1...n_components.

This class fits a low-rank PARAFAC decomposition to a 3D EEM stack with optional regularization: - Non-negativity - Elastic-net regularization on any factor (L1/L2 mix). - Quadratic priors on A, B, and/or C (controlled by prior_dict_sample, prior_dict_ex, prior_dict_em andgamma_sample,gamma_exandgamma_em), with NaNs allowed to skip entries. This is useful when fitted scores or spectral components are desired to be close (but not necessarily identical) to prior knowledge. For example, if a component’s concentration is known for some samples, a prior vector of length n_samples can be passed with real values for known samples and NaN for unknown samples. - A ratio constraint on paired rows of A: A[idx_top] ≈ beta * A[idx_bot]. This is useful when the ratios of component amplitudes between two sets of samples are desired to be constant. For example, if each sample is measured both unquenched and quenched using a fixed quencher dosage, then for a given chemically consistent component the ratio between unquenched and quenched amplitudes may be approximately constant across samples (Hu et al., ES&T, 2025). In this case, passing the unquenched and quenched sample indices to idx_top and idx_bot encourages a constant ratio. lam controls the strength of this regularization.

Parameters:

Name	Type	Description	Default
`n_components`	`int`	Number of PARAFAC components (rank of the CP decomposition).	required
`non_negativity`	`bool`	Whether to enforce non-negativity constraints on the factor matrices.	`True`
`solver`	`{'mu', 'hals'}`	Optimization algorithm used when `non_negativity=True`. - `'mu'`: Multiplicative Updates solver (tensorly.decomposition.non_negative_parafac). - `'hals'`: Hierarchical Alternating Least Squares solver with optional priors/regularization( eempy.solver.parafac_with_prior_hals). if `non_negativity=False`, a standard alternating least squares solver is used anyway ( tensorly.decomposition.parafac).	`'hals'`
`init`	`{'svd', 'random'} or tensorly.CPTensor`	Initialization strategy for the factor matrices. If a `tensorly.CPTensor` is provided, it is used as the initialization.	`'svd'`
`custom_init`	`optional`	Custom initialization passed to the HALS solver (when supported by the backend implementation).	`None`
`fixed_components`	`optional`	Component(s) to keep fixed during fitting (backend-specific behavior).	`None`
`tf_normalization`	`bool`	Whether to normalize each EEM by its total fluorescence during model fitting.	`False`
`loadings_normalization`	`{'sd', 'maximum', None}`	Post-fit normalization applied to excitation/emission loadings, with the sample scores scaled accordingly. - 'sd': normalize each loading vector to unit standard deviation. - 'maximum': normalize each loading vector to unit maximum. - None: no loading normalization.	`'maximum'`
`sort_components_by_em`	`bool`	Whether to sort components by the emission peak position (ascending). If `False`, components are kept in the solver output order (which may correlate with variance contribution depending on the solver).	`True`
`alpha_sample`	`float`	Regularization strength applied to the sample-mode factor matrix (backend-specific).	`0`
`alpha_ex`	`float`	Regularization strength applied to the excitation-mode factor matrix (backend-specific).	`0`
`alpha_em`	`float`	Regularization strength applied to the emission-mode factor matrix (backend-specific).	`0`
`l1_ratio`	`float`	Elastic-net mixing parameter used by the backend (`1` corresponds to L1 only; `0` to L2 only).	`1`
`prior_dict_sample`	`dict`	Prior information for the sample-mode factor matrix (backend-specific).	`None`
`prior_dict_ex`	`dict`	Prior information for the excitation-mode factor matrix (backend-specific).	`None`
`prior_dict_em`	`dict`	Prior information for the emission-mode factor matrix (backend-specific).	`None`
`gamma_sample`	`float`	Additional prior/penalty strength for the sample-mode factor matrix (backend-specific).	`0`
`gamma_ex`	`float`	Additional prior/penalty strength for the excitation-mode factor matrix (backend-specific).	`0`
`gamma_em`	`float`	Additional prior/penalty strength for the emission-mode factor matrix (backend-specific).	`0`
`ref_components`	`optional`	Reference component definitions used by the backend prior/regularization logic (backend-specific).	`None`
`kw_top`	`str`	Keyword used to identify "top" EEM from `eem_dataset.index` during fitting. "Top" and "bot" EEMs are assumed to be paired one-to-one and aligned by selection order (first "top" ↔ first "bot", etc.). A recommended naming convention is "a_sharing_sample_name" + "kw_top" or "kw_bot" for the quenched and unquenched EEM derived from the same original sample, so the pair differs only by `kw_top`/`kw_bot` and alignment is preserved when selecting by keywords. An alternative approach is to provide `idx_top` and `idx_bot` to directly specify "top" and "bot" EEMs by positions.	`None`
`kw_bot`	`str`	Keyword used to identify "bot" EEM from `eem_dataset.index` during fitting.	`None`
`idx_top`	`list of int`	0-based integer positions of samples in eem_dataset used as the numerator ("top") group (e.g., [0, 1, 2]).	`None`
`idx_bot`	`list of int`	0-based integer positions of samples in eem_dataset used as the denominator ("bot") group (e.g., [3, 4, 5]).	`None`
`lam`	`float`	Strength of ratio-based regularization between "top" and "bot" samples (backend-specific).	`0`
`max_iter_als`	`int`	Maximum number of outer ALS iterations.	`100`
`tol`	`float`	Convergence tolerance for the ALS loop.	`1e-6`
`max_iter_nnls`	`int`	Maximum number of iterations for NNLS subproblems (when used by the backend).	`500`
`random_state`	`int or numpy.random.RandomState`	Random seed or RNG used for reproducible initialization (when supported).	`None`
`mask`	`array-like`	A ideally sparse mask array for missing values (backend-specific). When provided, masked entries are ignored in fitting.	`None`

Attributes:

Name	Type	Description
`score`	`DataFrame or None`	Sample scores (sample loadings).
`ex_loadings`	`DataFrame or None`	Excitation-mode loadings for each component.
`em_loadings`	`DataFrame or None`	Emission-mode loadings for each component.
`fmax`	`DataFrame or None`	The maximum fluorescence intensity of components. Fmax is calculated by multiplying the maximum excitation loading and maximum emission loading for each component by its score.
`nnls_fmax`	`DataFrame or None`	Fmax estimated from refitting PARAFAC components to the original EEMs using NNLS. It may be slightly different from `fmax` due to the non-exact fit.
`components`	`ndarray or None`	Component EEMs with shape `(n_components, n_ex, n_em)` constructed from excitation/emission loadings.
`cptensors`	`CPTensor or None`	Fitted CP/PARAFAC tensor representation returned by the underlying solver.
`eem_stack_train`	`ndarray or None`	EEM stack used for model fitting, with shape `(n_samples, n_ex, n_em)`.
`eem_stack_reconstructed`	`ndarray or None`	Reconstructed EEM stack from the fitted model, with shape `(n_samples, n_ex, n_em)`.
`ex_range`	`ndarray or None`	Excitation wavelength grid corresponding to `ex_loadings` and `components`.
`em_range`	`ndarray or None`	Emission wavelength grid corresponding to `em_loadings` and `components`.
`beta`	`ndarray or None`	Component-wise ratio parameters used when ratio regularization / beta fitting is enabled.

References

[1] Tensorly documentation for CP/PARAFAC decomposition. [2] Hu, Yongmin, Céline Jacquin, and Eberhard Morgenroth. "Fluorescence Quenching as a Diagnostic Tool for Prediction Reliability Assessment and Anomaly Detection in EEM-Based Water Quality Monitoring." Environmental Science & Technology 59.36 (2025): 19490-19501.

component_peak_locations

component_peak_locations()

Get the ex/em of component peaks

Returns:

Name	Type	Description
`max_exem`	`list`	A List of (ex, em) of component peaks.

core_consistency

core_consistency()

Calculate the core consistency of the established PARAFAC model

Returns:

Name	Type	Description
`cc`	`float`	core consistency

export

export(filepath, info_dict)

Export the PARAFAC model to a text file that can be uploaded to the online PARAFAC model database Openfluor (https://openfluor.lablicate.com/#).

Parameters:

Name	Type	Description	Default
`filepath`	`str`	Location of the saved text file. Please specify the ".csv" extension.	required
`info_dict`	`dict`	A dictionary containing the model information. Possible keys include: name, creator date, email, doi, reference, unit, toolbox, fluorometer, nSample, decomposition_method, validation, dataset_calibration, preprocess, sources, description	required

Returns:

Name	Type	Description
`info_dict`	`dict`	A dictionary containing the information of the PARAFAC model.

fit

fit(eem_dataset: EEMDataset)

Establish a PARAFAC model based on a given EEM dataset

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	The EEM dataset used to fit the PARAFAC model.	required

Returns:

Name	Type	Description
`self`	`object`	The established PARAFAC model

leverage

leverage(mode: str = 'sample')

Calculate the leverage of a selected mode.

Parameters:

Name	Type	Description	Default
`mode`	`(str, {ex, em, sample})`	The mode of which the leverage is calculated.	`'sample'`

Returns:

Name	Type	Description
`lvr`	`DataFrame`	The table of leverage

predict

predict(eem_dataset: EEMDataset, fit_intercept=False, fit_beta=False, idx_top=None, idx_bot=None)

Predict the score and Fmax of a given EEM dataset using the component fitted. This method can be applied to a new EEM dataset independent of the one used in NMF model establishment.

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	The EEM dataset to be predicted.	required
`fit_intercept`	`bool`	Whether to calculate the intercept.	`False`
`fit_beta`	`bool`	Whether to fit the beta parameter (the proportions between "top" and "bot" samples).	`False`
`idx_top`	`list`	List of indices of samples serving as numerators in ratio calculation.	`None`
`idx_bot`		List of indices of samples serving as denominators in ratio calculation.	`None`

Returns:

Name	Type	Description
`score_sample`	`DataFrame`	The fitted score.
`fmax_sample`	`DataFrame`	The fitted Fmax.
`eem_stack_pred`	`np.ndarray (3d)`	The EEM dataset reconstructed.

residual

residual()

Get the residual of the established PARAFAC model, i.e., the difference between the original EEM dataset and the reconstructed EEM dataset.

Returns:

Name	Type	Description
`res`	`np.ndarray (3d)`	the residual

sample_relative_rmse

sample_relative_rmse()

Calculate the normalized root mean squared error (normalized RMSE) of EEM of each sample. It is defined as the RMSE divided by the mean of original signal.

Returns:

Name	Type	Description
`relative_rmse`	`DataFrame`	Table of normalized RMSE

sample_rmse

sample_rmse()

Calculate the root mean squared error (RMSE) of EEM of each sample.

Returns:

Name	Type	Description
`rmse`	`DataFrame`	Table of RMSE

sample_summary

sample_summary()

Get a table showing the score, Fmax, leverage, RMSE and normalized RMSE for each sample.

Returns:

Name	Type	Description
`summary`	`DataFrame`	Table of samples' score, Fmax, leverage, RMSE and normalized RMSE.

variance_explained

variance_explained()

Calculate the explained variance of the established PARAFAC model

Returns:

Name	Type	Description
`ev`	`float`	the explained variance

EEMNMF

Non-negative matrix factorization (NMF) model for an excitation–emission matrix (EEM) dataset.

This class fits a low-rank NMF decomposition to a 3D EEM stack by unfolding it into a 2D non-negative matrix with shape (n_samples, n_pixels) and factorizing it into: - A non-negative sample score matrix W with shape (n_samples, n_components). - A non-negative component matrix H with shape (n_components, n_pixels), where n_pixels = n_ex * n_em`in the unfolded representation.

The fitted NMF components are reshaped back to EEM form with shape (n_components, n_ex, n_em). Component amplitudes are reported as Fmax-like values using: - fmax : scores from the NMF factorization, rescaled to account for component normalization. - nnls_fmax : scores refit by non-negative least squares (NNLS) against the extracted components, which can differ slightly from fmax due to the non-exact NMF reconstruction and/or constraints.

Optional regularization / constraints (solver-dependent) include: - Non-negativity (always enforced by this model). - Elastic-net regularization on W and/or H (L1/L2 mix). - Quadratic priors on W and/or H (controlled by prior_dict_W, prior_dict_H and gamma_W, gamma_H), with NaNs allowed to skip entries. This is useful when fitted scores or spectral components are desired to be close (but not necessarily identical) to prior knowledge. For example, if a component’s concentration is known for some samples, a prior vector of length n_samples can be passed with real values for known samples and NaN for unknown samples. - A ratio constraint on paired rows of W: W[idx_top] ≈ beta * W[idx_bot]. This is useful when the ratios of component amplitudes between two sets of samples are desired to be constant. For example, if each sample is measured both unquenched and quenched using a fixed quencher dosage, then for a given chemically consistent component the ratio between unquenched and quenched amplitudes may be approximately constant across samples (Hu et al., ES&T, 2025). In this case, passing the unquenched and quenched sample indices to idx_top and idx_bot encourages a constant ratio. lam controls the strength of this regularization.

Parameters:

Name	Type	Description	Default
`n_components`	`int`	Number of NMF components (rank of the factorization).	required
`solver`	`{'cd', 'mu', 'hals'}`	Optimization algorithm used to fit NMF. - `'cd'`: Coordinate Descent solver (scikit-learn `decomposition.NMF`). - `'mu'`: Multiplicative Updates solver (scikit-learn `decomposition.NMF`). - `'hals'`: Hierarchical Alternating Least Squares solver with optional priors/regularization (`eempy.solver.nmf_with_prior_hals`).	`'cd'`
`init`	`str`	Initialization strategy passed to the selected solver. Common options include `'random'`, `'nndsvd'`, `'nndsvda'`, `'nndsvdar'` (solver-dependent). For HALS, a custom initialization can be provided via `custom_init` when supported.	`'nndsvda'`
`custom_init`	`optional`	Custom initialization passed to the HALS solver (when supported by the backend implementation).	`None`
`fixed_components`	`optional`	Component(s) to keep fixed during fitting (backend-specific behavior).	`None`
`beta_loss`	`{'frobenius', 'kullback-leibler', 'itakura-saito'}`	Beta divergence used by the `'mu'` solver. Ignored by `'cd'` and `'hals'`.	`'frobenius'`
`alpha_sample`	`float`	Regularization strength applied to the sample-mode factor matrix `W` (backend-specific). For scikit-learn, this maps to `alpha_W`.	`0`
`alpha_component`	`float`	Regularization strength applied to the component matrix `H` (backend-specific). For scikit-learn, this maps to `alpha_H`.	`0`
`l1_ratio`	`float`	Elastic-net mixing parameter used by the backend (`1` corresponds to L1 only; `0` to L2 only).	`1`
`prior_dict_W`	`dict`	Prior information for the sample-mode factor matrix `W` (HALS solver only). Keys are component indices (int); values are 1D arrays of length `n_samples`. Use NaNs to indicate unknown entries that should not contribute to the penalty.	`None`
`prior_dict_H`	`dict`	Prior information for the component matrix `H` (HALS solver only). Keys are component indices (int); values are 1D arrays of length `n_pixels`. Use NaNs to indicate unknown entries that should not contribute to the penalty.	`None`
`prior_dict_A`	`dict`	Additional prior mapping used by the HALS backend (backend-specific).	`None`
`prior_dict_B`	`dict`	Additional prior mapping used by the HALS backend (backend-specific).	`None`
`prior_dict_C`	`dict`	Additional prior mapping used by the HALS backend (backend-specific).	`None`
`gamma_W`	`float`	Additional prior/penalty strength for the sample-mode factor matrix `W` (HALS solver only).	`0`
`gamma_H`	`float`	Additional prior/penalty strength for the component matrix `H` (HALS solver only).	`0`
`gamma_A`	`float`	Additional prior/penalty strength for backend-specific prior term A (HALS solver only).	`0`
`gamma_B`	`float`	Additional prior/penalty strength for backend-specific prior term B (HALS solver only).	`0`
`gamma_C`	`float`	Additional prior/penalty strength for backend-specific prior term C (HALS solver only).	`0`
`ref_components`	`optional`	Reference component definitions used by the backend prior/regularization logic (backend-specific).	`None`
`kw_top`	`str`	Keyword used to identify "top" EEM from `eem_dataset.index` during fitting. "Top" and "bot" EEMs are assumed to be paired one-to-one and aligned by selection order (first "top" ↔ first "bot", etc.). A recommended naming convention is "a_sharing_sample_name" + "kw_top" or "kw_bot" for the quenched and unquenched EEM derived from the same original sample, so the pair differs only by `kw_top`/`kw_bot` and alignment is preserved when selecting by keywords. An alternative approach is to provide `idx_top` and `idx_bot` to directly specify "top" and "bot" EEMs by positions.	`None`
`kw_bot`	`str`	Keyword used to identify "bot" EEM from `eem_dataset.index` during fitting.	`None`
`idx_top`	`list of int`	0-based integer positions of samples in eem_dataset used as the numerator ("top") group (e.g., [0, 1, 2]).	`None`
`idx_bot`	`list of int`	0-based integer positions of samples in eem_dataset used as the denominator ("bot") group (e.g., [3, 4, 5]).	`None`
`lam`	`float`	Strength of ratio-based regularization between "top" and "bot" samples (HALS solver only).	`0`
`fit_rank_one`	`bool`	Whether to enable a rank-one component constraint/penalty in the HALS backend (backend-specific).	`False`
`normalization`	`{'pixel_std', None}`	Optional preprocessing applied to the unfolded data matrix before factorization. - `None`: no normalization. - `'pixel_std'`: divide each pixel (feature) by its standard deviation across samples.	`None`
`sort_components_by_em`	`bool`	Whether to sort components by the emission peak position (ascending). If `False`, components are kept in the solver output order.	`True`
`max_iter_als`	`int`	Maximum number of outer iterations for the HALS solver.	`100`
`max_iter_nnls`	`int`	Maximum number of iterations for NNLS subproblems (when used by the backend).	`500`
`tol`	`float`	Convergence tolerance passed to the solver.	`1e-5`
`random_state`	`int`	Random seed used by solvers that support it.	`42`

Attributes:

Name	Type	Description
`fmax`	`DataFrame or None`	Sample-mode component amplitudes computed from the fitted NMF `W` (and rescaling after component normalization). Columns follow the naming convention `"component {i} NMF-Fmax"`.
`nnls_fmax`	`DataFrame or None`	Component amplitudes computed by refitting each EEM using NNLS with the fitted components. Columns follow the naming convention `"component {i} NNLS-Fmax"`.
`components`	`ndarray or None`	Component EEMs with shape `(n_components, n_ex, n_em)` constructed from the unfolded `H`. Each component is normalized by its maximum value (peak intensity equals 1), and the scaling is carried into `fmax`.
`eem_stack_train`	`ndarray or None`	EEM stack used for model fitting, with shape `(n_samples, n_ex, n_em)`.
`eem_stack_reconstructed`	`ndarray or None`	Reconstructed EEM stack from the fitted model, with shape `(n_samples, n_ex, n_em)`.
`eem_stack_unfolded`	`ndarray or None`	Unfolded 2D matrix used by the solver, with shape `(n_samples, n_pixels)`.
`normalization_factor_std`	`ndarray or None`	Per-pixel standard deviation used when `normalization='pixel_std'`. Shape is `(n_pixels,)`. `None` if no pixel-wise standard-deviation normalization was applied.
`normalization_factor_max`	`ndarray or None`	Per-component scaling factors (maximum value of each component in the unfolded space) used to normalize `components` and rescale reported amplitudes. Shape is `(n_components,)`.
`ex_range`	`ndarray or None`	Excitation wavelength grid corresponding to `components`.
`em_range`	`ndarray or None`	Emission wavelength grid corresponding to `components`.
`beta`	`ndarray or None`	Component-wise ratio parameters used when ratio regularization / beta fitting is enabled (backend-specific).
`decomposer`	`object or None`	Underlying solver object when using scikit-learn NMF (e.g., fitted `sklearn.decomposition.NMF`). May be `None` depending on the solver/backend implementation.
`reconstruction_error`	`float or None`	Reconstruction error if provided by the backend/solver; otherwise `None`.
`objective_function_error`	`object or None`	Objective tracking information if provided by the backend/solver; otherwise `None`.

References

[1] scikit-learn documentation for sklearn.decomposition.NMF (Coordinate Descent and Multiplicative Updates). [2] Hu, Yongmin, Céline Jacquin, and Eberhard Morgenroth. "Fluorescence Quenching as a Diagnostic Tool for Prediction Reliability Assessment and Anomaly Detection in EEM-Based Water Quality Monitoring." Environmental Science & Technology 59.36 (2025): 19490-19501.

component_peak_locations

component_peak_locations()

Get the ex/em of component peaks

Returns:

Name	Type	Description
`max_exem`	`list`	A List of (ex, em) of component peaks.

fit

fit(eem_dataset)

Fit NMF model.

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	The EEM dataset used to fit the NMF model.	required

predict

predict(eem_dataset: EEMDataset, fit_intercept=False, fit_beta=False, idx_top=None, idx_bot=None)

Predict the score and Fmax of a given EEM dataset using the component fitted. This method can be applied to a new EEM dataset independent of the one used in NMF model establishment.

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	The EEM dataset to be predicted.	required
`fit_intercept`	`bool`	Whether to calculate the intercept.	`False`
`fit_beta`	`bool`	Whether to fit the beta parameter (the proportions between "top" and "bot" samples).	`False`
`idx_top`	`list`	List of indices of samples serving as numerators in ratio calculation.	`None`
`idx_bot`	`list`	List of indices of samples serving as denominators in ratio calculation.	`None`

Returns:

Name	Type	Description
`score_sample`	`DataFrame`	The fitted score.
`fmax_sample`	`DataFrame`	The fitted Fmax.
`eem_stack_pred`	`np.ndarray (3d)`	The EEM dataset reconstructed.

residual

residual()

Get the residual of the established PARAFAC model, i.e., the difference between the original EEM dataset and the reconstructed EEM dataset.

Returns:

Name	Type	Description
`res`	`np.ndarray (3d)`	the residual

sample_normalized_rmse

sample_normalized_rmse()

Calculate the normalized root mean squared error (normalized RMSE) of EEM of each sample. It is defined as the RMSE divided by the mean of original signal.

Returns:

Name	Type	Description
`normalized_sse`	`DataFrame`	Table of normalized RMSE

sample_rmse

sample_rmse()

Calculate the root mean squared error (RMSE) of EEM of each sample.

Returns:

Name	Type	Description
`sse`	`DataFrame`	Table of RMSE

variance_explained

variance_explained()

Calculate the explained variance of the established NMF model

Returns:

Name	Type	Description
`ev`	`float`	the explained variance

SplitValidation

Validate PARAFAC or NMF models by comparing component consistency across EEM sub-datasets.

Parameters:

Name	Type	Description	Default
`base_model`	`PARAFAC or EEMNMF`	Base model used to fit each sub-dataset.	required
`n_splits`	`int`	Number of splits used to create sub-datasets.	`4`
`combination_size`	`int or {"half"}`	Number of splits assembled into each combination. If "half" is passed, each combination uses half of the splits (split-half validation).	`"half"`
`rule`	`{"random", "sequential"}`	Split rule for the dataset. "sequential" splits by index order.	`"random"`
`random_state`	`int`	Random seed used when `rule="random"`.	`None`

Attributes:

Name	Type	Description
`eem_subsets`	`dict`	Mapping of subset labels to EEMDataset instances.
`subset_specific_models`	`dict`	Mapping of subset labels to fitted PARAFAC or EEMNMF models.
`eem_dataset_full`	`EEMDataset or None`	The full dataset used to generate splits.

compare_components

compare_components()

Compare component EEMs between models fitted to paired sub-datasets.

Returns:

Name	Type	Description
`similarities_components`	`DataFrame`	Similarity scores for component EEMs.

compare_parafac_loadings

compare_parafac_loadings()

Compare excitation/emission loadings between PARAFAC models fitted to paired sub-datasets.

This method is only meaningful for PARAFAC models because it relies on Ex/Em loadings.

Returns:

Name	Type	Description
`similarities_ex`	`DataFrame`	Similarity scores for excitation loadings per component.
`similarities_em`	`DataFrame`	Similarity scores for emission loadings per component.

correlation_cv

correlation_cv(ref_col)

Cross-validate reference correlations using component Fmax values.

For each split pair, fit a linear regression on the training subset and evaluate on the paired test subset. Metrics are reported for each component as R2 and RMSE for both training and test.

Parameters:

Name	Type	Description	Default
`ref_col`	`str`	Column name in `eem_dataset_full.ref` used as the reference variable.	required

Returns:

Type	Description
`DataFrame`	Table of R2 and RMSE metrics for each component and split pairing.

fit

fit(eem_dataset: EEMDataset)

Fit the base model on each sub-dataset and store the fitted models.

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	Full dataset used for splitting and model fitting.	required

Returns:

Name	Type	Description
`self`	`SplitValidation`	Fitted validation object.

KMethod

K-method (e.g., K-PARAFACs or K-NMFs) for EEM clustering by minimizing reconstruction error (Hu et al., Water Research, 2025).

This class implements the K-method family of clustering algorithms for excitation–emission matrix (EEM) datasets. The key hypothesis is that fitting EEMs with high chemical composition variability using a single, unified set of components (e.g., one PARAFAC or NMF model) can lead to over-generalized component formation and large reconstruction error. In contrast, EEMs sharing similar chemical compositions can be clustered and represented by cluster-specific component sets, resulting in a number of unique component sets that better capture the variability in chemical composition between clusters and reduce overall reconstruction error.

Based on this hypothesis, K-method searches for a clustering strategy that minimizes the overall reconstruction error by iterating between: - Estimation: fit a base decomposition model (base_model) separately on each current cluster to obtain cluster-specific models. - Assignment: assign each sample to the cluster whose model yields the smallest distance (e.g., reconstruction RMSE), forming updated clusters.

Repeating this procedure yields cluster-specific PARAFAC/NMF models that (ideally) reconstruct the dataset better than a single unified model.

In addition, K-method can be run multiple times with subsampling to form a consensus matrix and then derive a final clustering using hierarchical clustering on a distance matrix computed from consensus values.

Parameters:

Name	Type	Description	Default
`base_model`	`object`	Base decomposition model used within each cluster (e.g., an instance of `PARAFAC` or `EEMNMF`). Before passing to `KMethod`, the base model should be properly configured (e.g., number of components, regularizations to be implemented, etc.).	required
`n_initial_splits`	`int`	Number of splits used in initialization (the first partition of the dataset before iterative refinement).	required
`distance_metric`	`{'reconstruction_error', 'reconstruction_error_with_beta', 'quenching_coefficient'}`	Criterion used for assignment in the maximization step. - `'reconstruction_error'`: assign each sample to the model with the smallest per-sample RMSE. - `'reconstruction_error_with_beta'`: like reconstruction error, but pairs samples into top/bot groups and uses beta-based reconstruction that forces fmax ratios between paired samples equal to the beta values across all samples (requires `kw_top`, `kw_bot` in `base_model`). - `'quenching_coefficient'`: assign samples based on similarity of estimated quenching coefficients derived from paired top/bot samples (requires `kw_top` and `kw_bot`).	`'reconstruction_error'`
`max_iter`	`int`	Maximum number of K-method iterations in a single base clustering run.	`20`
`tol`	`float`	Convergence tolerance based on similarity between cluster-specific models of two consecutive iterations. If the average Tucker’s congruence (or component similarity proxy) exceeds `1 - tol`, convergence is declared.	`0.001`
`elimination`	`{'default'} or int`	Minimum allowed cluster size during optimization. Clusters with fewer samples than the threshold are removed. - `'default'`: use `base_model.n_components` as the minimum cluster size. - `int`: explicit minimum cluster size.	`'default'`

Attributes:

Name	Type	Description
`unified_model`	`object or None`	Unified model fitted once on the full dataset (a deep copy of `base_model`). Used as a reference for aligning components and for some distance calculations.
`label_history`	`list or None`	History of cluster assignments. For base clustering runs, this is typically a list containing a DataFrame with per-sample labels across iterations.
`error_history`	`list or None`	History of per-sample distances/errors (e.g., RMSE) across iterations, typically stored as DataFrames.
`silhouette_score`	`float or None`	Silhouette score computed on the final distance matrix during hierarchical clustering (when available).
`labels`	`ndarray or None`	Final cluster labels for each sample. Labels are cluster IDs returned by hierarchical clustering (typically 1..K), or by base clustering when used directly.
`index_sorted`	`list or None`	Dataset index reordered by the final hierarchical clustering labels (when available).
`ref_sorted`	`DataFrame or None`	Reference table reordered by the final hierarchical clustering labels (when available).
`threshold_r`	`float or None`	Distance threshold used for hierarchical clustering cut (derived from the linkage matrix).
`eem_clusters`	`dict or None`	Mapping from cluster label to an `EEMDataset` containing the EEMs assigned to that cluster.
`cluster_specific_models`	`dict or None`	Mapping from cluster label to the fitted cluster-specific model (deep copies of `base_model` fitted on each cluster).
`consensus_matrix`	`ndarray or None`	Consensus matrix `M` with shape `(n_samples, n_samples)`, where `M[i, j]` is the fraction of base runs in which sample i and j co-occur in the same cluster.
`distance_matrix`	`ndarray or None`	Distance matrix derived from consensus, typically `D[i, j] = (1 - M[i, j])**p` (see `consensus_conversion_power`).
`linkage_matrix`	`ndarray or None`	Hierarchical clustering linkage matrix computed from the consensus-derived distance matrix.
`consensus_matrix_sorted`	`ndarray or None`	Consensus matrix reordered by the final cluster labels for visualization.

References

[1] Hu, Yongmin, Eberhard Morgenroth, and Céline Jacquin. "Online monitoring of greywater reuse system using excitation-emission matrix (EEM) and K-PARAFACs." Water Research 268 (2025): 122604.

base_clustering

base_clustering(eem_dataset: EEMDataset)

Run clustering for a single time.

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	The EEM dataset to be clustered.	required

Returns:

Name	Type	Description
`cluster_labels`	`ndarray`	Cluster labels.
`label_history`	`list`	Cluster labels in each iteration.
`error_history`	`list`	Average reconstruction error (RMSE) in each iteration.

calculate_consensus

calculate_consensus(eem_dataset: EEMDataset, n_base_clusterings: int, subsampling_portion: float)

Run the clustering for many times and combine the output of each run to obtain an optimal clustering.

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	EEM dataset.	required
`n_base_clusterings`	`int`	Number of base clustering.	required
`subsampling_portion`	`float`	The portion of EEMs remained after subsampling.	required

Returns:

Name	Type	Description
`self`	`object`	The established K-PARAFACs model

hierarchical_clustering

hierarchical_clustering(eem_dataset, n_clusters, consensus_conversion_power=1)

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	EEM dataset to cluster.	required
`n_clusters`	`int`	Number of clusters.	required
`consensus_conversion_power`	`float`	The factor adjusting the conversion from consensus matrix (M) to distance matrix (D) used for hierarchical clustering. D_{i,j} = (1 - M_{i,j})^factor. This number influences the gradient of distance with respect to consensus. A smaller number will lead to shaper increase of distance at consensus close to 1.	`1`

predict

predict(eem_dataset: EEMDataset)

Fit the cluster-specific models to a given EEM dataset. Each EEM in the EEM dataset is fitted to the model that produce the least RMSE.

Parameters:

Name	Type	Description	Default
`eem_dataset`	`EEMDataset`	The EEM dataset to be predicted.	required

Returns:

Name	Type	Description
`best_model_label`	`DataFrame`	The best-fit model for every EEM.
`score_all`	`DataFrame`	The score fitted with each cluster-specific model.
`fmax_all`	`DataFrame`	The fmax fitted with each cluster-specific model.
`sample_error`	`DataFrame`	The RMSE fitted with each cluster-specific model.

combine_eem_datasets

combine_eem_datasets(list_eem_datasets)

Combine all EEMDataset objects in a list

Parameters:

Name	Type	Description	Default
`list_eem_datasets`	`list.`	List of EEM datasets.	required

Returns:

Name	Type	Description
`eem_dataset_combined`	`EEMDataset`	EEM dataset combined.