read_data

eempy.read_data.read_data

Functions for importing files Author: Yongmin Hu (yongminhu@outlook.com) Last update: 2026-01

get_filelist

get_filelist(folderpath, mandatory_keywords, optional_keywords)

Get a list containing all filenames with given keywords in a folder. A filename must contain all mandatory keywords and at least one of the optional keywords.

Parameters:

Name	Type	Description	Default
`folderpath`	`str`	Folder path to search.	required
`mandatory_keywords`	`str or list of str`	Filenames must contain all of these keywords (logical AND).	required
`optional_keywords`	`str or list of str`	Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords.	required

Returns:

Type	Description
`list of str`	Filenames matching the keyword filters.

read_abs

read_abs(file_path, index_pos: Union[Tuple, List, None] = None, data_format='default')

Import UV absorbance data from UV absorbance file.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	The filepath to the UV absorbance file	required
`index_pos`	`None or tuple with two elements`	The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "EEM_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter.	`None`
`data_format`	`str`	Format of the UV absorbance file. By passing `data_format='default'`, the following format is supported: `Ex_1 A_1 Ex_2 A_2 ... ... Ex_n A_n` Where A_i correspond to the absorbance at wavelength Ex_i	`'default'`

Returns:

Name	Type	Description
`absorbance`	`ndarray`	The UV absorbance spectrum (1D array).
`ex_range`	`ndarray`	The excitation wavelengths (1D array).
`index`	`str`	The index of the Absorbance spectrum.

read_abs_dataset

read_abs_dataset(folder_path, mandatory_keywords='ABS', optional_keywords=[], data_format: str = 'default', index_pos: Union[Tuple, List, None] = None, custom_filename_list: Union[Tuple, List, None] = None, wavelength_alignment=False, interpolation_method: str = 'linear')

Parameters:

Name	Type	Description	Default
`folder_path`	`str`	The path to the folder containing absorbance files.	required
`mandatory_keywords`	`list of str`	Filenames must contain all of these keywords (logical AND). Example : ["ABS", "2021-02-02"] will match only files containing both substrings.	`'ABS'`
`optional_keywords`	`list of str`	Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords.	`[]`
`data_format`	`str`	Specify the type of absorbance data format.	`'default'`
`index_pos`	`tuple/list or None`	The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "ABS_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter.	`None`
`custom_filename_list`	`list or None`	If a list is passed, only the absorbance files whose filenames are specified in the list will be imported.	`None`
`wavelength_alignment`	`bool`	Align the ex range of the absorbance files. This is useful if the absorbance are measured with different ex range. Note that ex will be aligned according to the ex ranges with the smallest intervals among all the imported absorbance files.	`False`
`interpolation_method`	`str`	The interpolation method used for aligning ex. It is only useful if wavelength_alignment=True.	`'linear'`

Returns:

Name	Type	Description
`abs_stack`	`ndarray`	A stack of imported absorbance files (n_samples, n_wavelengths).
`ex_range`	`ndarray`	The excitation wavelengths (1D array).
`indexes`	`list or None`	The list of absorbance file indexes (if index_pos is specified).

read_eem

read_eem(file_path: str, index_pos: Union[Tuple[int, int], List[int], None] = None, data_format: str = 'default', as_timestamp: bool = False, timestamp_format: Optional[str] = None, delimiter: Optional[str] = None, file_first_row: str = 'ex')

Import EEM from file (tabular format).

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Filepath to the EEM file.	required
`index_pos`	`None or tuple/list with two elements`	Start/end positions of index in filename (1-based start, end inclusive as in original code intent). Example : (4, 13) extracts basename[3:13].	`None`
`data_format`	`str, {'default'}`	Format of the EEM file. By passing `'default'`, the following tabular format is supported: - The first row contains excitation wavelengths (Ex). - The top left may be blank or non-numeric; any non-numeric tokens are ignored. - The first column of subsequent rows contains emission wavelengths (Em). - The remaining cells are fluorescence intensities for each (Ex, Em) pair. Schematic layout: `<blank or label> Ex_1 Ex_2 Ex_3 ... Ex_n Em_1 I_11 I_12 I_13 ... I_1N Em_2 I_21 I_22 I_23 ... I_2N ... Em_m I_m1 I_m2 I_m3 ... I_mn` Where I_nm is the intensity at excitation Ex_n and emission Em_m. If your files are in similar format but with the first row being Em wavelengths and the fist column being Ex wavelengths, pass `file_first_row='em'`.	`'default'`
`as_timestamp`	`bool`	Whether to parse extracted index as datetime.	`False`
`timestamp_format`	`str`	Datetime strptime format if as_timestamp is True. Rules can be seen on https://docs.python.org/3/library/datetime.html#format-codes	`None`
`delimiter`	`Optional[str] = None`	Field delimiter. If None (default), split on arbitrary whitespace (tabs/spaces). If your file uses a specific delimiter (e.g., comma or semicolon), pass `delimiter=','` or `delimiter=';'`, etc.	`None`
`file_first_row`	`{"ex","em"}`	Whether the first row contains Ex or Em wavelengths.	`'ex'`

Returns:

Name	Type	Description
`intensity`	`ndarray`	EEM matrix with shape (n_ex, n_em). Rows correspond to excitation wavelengths, columns to emission wavelengths. Rows correspond to excitation wavelengths in ascending order, columns to emission wavelengths in ascending order. The smallest excitation & emission wavelengths correspond to intensity[0, 0].
`ex_range`	`ndarray`	Sorted excitation wavelengths (ascending).
`em_range`	`ndarray`	Sorted emission wavelengths (ascending).
`index`	`str \| datetime \| None`	Extracted index (optionally parsed as datetime).

read_eem_dataset

read_eem_dataset(folder_path: str, mandatory_keywords=None, optional_keywords=None, data_format: str = 'default', index_pos: Union[Tuple, List, None] = None, as_timestamp=False, timestamp_format=None, delimiter=None, file_first_row='ex', custom_filename_list: Union[Tuple, List, None] = None, wavelength_alignment=False, interpolation_method: str = 'linear')

Import EEM dataset from files.

Parameters:

Name	Type	Description	Default
`folder_path`	`str`	The path to the folder containing EEMs.	required
`mandatory_keywords`	`list of str`	Filenames must contain all of these keywords (logical AND). Example : ["PEM", "2021-02-02"] will match only files containing both substrings.	`None`
`optional_keywords`	`list of str`	Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords.	`None`
`data_format`	`str, {'default'}`	Format of the EEM file. By passing `'default'`, the following tabular format is supported: - The first row contains excitation wavelengths (Ex). - The top left may be blank or non-numeric; any non-numeric tokens are ignored. - The first column of subsequent rows contains emission wavelengths (Em). - The remaining cells are fluorescence intensities for each (Ex, Em) pair. Schematic layout: `<blank or label> Ex_1 Ex_2 Ex_3 ... Ex_n Em_1 I_11 I_12 I_13 ... I_1N Em_2 I_21 I_22 I_23 ... I_2N ... Em_m I_m1 I_m2 I_m3 ... I_mn` Where I_nm is the intensity at excitation Ex_n and emission Em_m. If your files are in similar format but with the first row being Em wavelengths and the fist column being Ex wavelengths, pass `file_first_row='em'`.	`'default'`
`index_pos`	`tuple/list or None`	The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "EEM_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter.	`None`
`as_timestamp`	`bool`	Whether to read the index as timestamps.	`False`
`timestamp_format`	`str`	Datetime strptime format if as_timestamp is True. Rules can be seen on https://docs.python.org/3/library/datetime.html#format-codes	`None`
`delimiter`	`Optional[str] = None`	Field delimiter. If None (default), split on arbitrary whitespace (tabs/spaces). If your file uses a specific delimiter (e.g., comma or semicolon), pass `delimiter=','` or `delimiter=';'`, etc.	`None`
`file_first_row`	`{"ex","em"}`	Whether the first row contains Ex or Em wavelengths.	`'ex'`
`custom_filename_list`	`list or None`	If a list is passed, only the EEM files whose filenames are specified in the list will be imported.	`None`
`wavelength_alignment`	`bool`	Align the ex/em ranges of the EEMs. This is useful if the EEMs are measured with different ex/em ranges. Note that ex/em will be aligned according to the ex/em ranges with the smallest intervals among all the imported EEMs.	`False`
`interpolation_method`	`str`	The interpolation method used for aligning ex/em. It is only useful if wavelength_alignment=True.	`'linear'`

Returns:

Name	Type	Description
`eem_stack`	`ndarray`	EEM stack with shape (n_sample, n_ex, n_em). For each EEM with shape (n_ex, n_em), rows correspond to excitation wavelengths, columns to emission wavelengths. Rows correspond to excitation wavelengths in ascending order, columns to emission wavelengths in ascending order. The smallest excitation & emission wavelengths correspond to intensity[0, 0].
`ex_range`	`ndarray`	Sorted excitation wavelengths (ascending).
`em_range`	`ndarray`	Sorted emission wavelengths (ascending).
`indexes`	`list of str \| datetime \| None`	Extracted per-file indexes (optionally parsed as datetime).

read_parafac_model_from_openfluor

read_parafac_model_from_openfluor(filepath)

Import PARAFAC model from a text file written in the format suggested by OpenFluor ( https : //openfluor.lablicate.com/). Note that the models downloaded from OpenFluor normally don't have scores.

Parameters:

Name	Type	Description	Default
`filepath`	`str`	The filepath to the model file.	required

Returns:

Name	Type	Description
`ex_df`	`DataFrame`	Excitation loadings
`em_df`	`DataFrame`	Emission loadings
`score_df`	`DataFrame or None`	Scores (if there's any)
`info_dict`	`dict`	A dictionary containing the model information

read_parafac_models

read_parafac_models(datdir, kw)

Search all PARAFAC models in a folder by keyword in filenames and import all of them into a dictionary using read_parafac_model()

Parameters:

Name	Type	Description	Default
`datdir`	`str`	Folder containing PARAFAC model files.	required
`kw`	`str`	Keyword used to filter filenames.	required

Returns:

Type	Description
`list of dict`	A list of dicts with keys: `info`, `ex`, `em`, `score`.

read_reference_from_text

read_reference_from_text(filepath)

Read reference data from text file. The reference data can be any 1D data (e.g., dissolved organic carbon concentration). This first line of the file should be a header, and then each following line contains one datapoint, without any separators other than line breaks. For example:

    DOC (mg/L)
    1.0
    2.0
    ...
    5.0

Parameters:

Name	Type	Description	Default
`filepath`	`str`	The filepath to the reference file.	required

Returns:

Name	Type	Description
`reference_data`	`list of float`	The reference values (one per line after the header).
`header`	`str`	The header string from the first line.