Skip to content

read_data

eempy.read_data.read_data

Functions for importing files Author: Yongmin Hu (yongminhu@outlook.com) Last update: 2026-01

get_filelist

get_filelist(folderpath, mandatory_keywords, optional_keywords)

Get a list containing all filenames with given keywords in a folder. A filename must contain all mandatory keywords and at least one of the optional keywords.

Parameters:

Name Type Description Default
folderpath str

Folder path to search.

required
mandatory_keywords str or list of str

Filenames must contain all of these keywords (logical AND).

required
optional_keywords str or list of str

Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords.

required

Returns:

Type Description
list of str

Filenames matching the keyword filters.

read_abs

read_abs(file_path, index_pos: Union[Tuple, List, None] = None, data_format='default')

Import UV absorbance data from UV absorbance file.

Parameters:

Name Type Description Default
file_path str

The filepath to the UV absorbance file

required
index_pos None or tuple with two elements

The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "EEM_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter.

None
data_format str

Format of the UV absorbance file. By passing data_format='default', the following format is supported:

Ex_1    A_1
Ex_2    A_2
...     ...
Ex_n    A_n

Where A_i correspond to the absorbance at wavelength Ex_i

'default'

Returns:

Name Type Description
absorbance ndarray

The UV absorbance spectrum (1D array).

ex_range ndarray

The excitation wavelengths (1D array).

index str

The index of the Absorbance spectrum.

read_abs_dataset

read_abs_dataset(folder_path, mandatory_keywords='ABS', optional_keywords=[], data_format: str = 'default', index_pos: Union[Tuple, List, None] = None, custom_filename_list: Union[Tuple, List, None] = None, wavelength_alignment=False, interpolation_method: str = 'linear')

Parameters:

Name Type Description Default
folder_path str

The path to the folder containing absorbance files.

required
mandatory_keywords list of str

Filenames must contain all of these keywords (logical AND). Example : ["ABS", "2021-02-02"] will match only files containing both substrings.

'ABS'
optional_keywords list of str

Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords.

[]
data_format str

Specify the type of absorbance data format.

'default'
index_pos tuple/list or None

The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "ABS_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter.

None
custom_filename_list list or None

If a list is passed, only the absorbance files whose filenames are specified in the list will be imported.

None
wavelength_alignment bool

Align the ex range of the absorbance files. This is useful if the absorbance are measured with different ex range. Note that ex will be aligned according to the ex ranges with the smallest intervals among all the imported absorbance files.

False
interpolation_method str

The interpolation method used for aligning ex. It is only useful if wavelength_alignment=True.

'linear'

Returns:

Name Type Description
abs_stack ndarray

A stack of imported absorbance files (n_samples, n_wavelengths).

ex_range ndarray

The excitation wavelengths (1D array).

indexes list or None

The list of absorbance file indexes (if index_pos is specified).

read_eem

read_eem(file_path: str, index_pos: Union[Tuple[int, int], List[int], None] = None, data_format: str = 'default', as_timestamp: bool = False, timestamp_format: Optional[str] = None, delimiter: Optional[str] = None, file_first_row: str = 'ex')

Import EEM from file (tabular format).

Parameters:

Name Type Description Default
file_path str

Filepath to the EEM file.

required
index_pos None or tuple/list with two elements

Start/end positions of index in filename (1-based start, end inclusive as in original code intent). Example : (4, 13) extracts basename[3:13].

None
data_format str, {'default'}

Format of the EEM file. By passing 'default', the following tabular format is supported: - The first row contains excitation wavelengths (Ex). - The top left may be blank or non-numeric; any non-numeric tokens are ignored. - The first column of subsequent rows contains emission wavelengths (Em). - The remaining cells are fluorescence intensities for each (Ex, Em) pair.

Schematic layout:

<blank or label>   Ex_1   Ex_2   Ex_3   ...   Ex_n
Em_1               I_11   I_12   I_13   ...   I_1N
Em_2               I_21   I_22   I_23   ...   I_2N
...
Em_m               I_m1   I_m2   I_m3   ...   I_mn

Where I_nm is the intensity at excitation Ex_n and emission Em_m. If your files are in similar format but with the first row being Em wavelengths and the fist column being Ex wavelengths, pass file_first_row='em'.

'default'
as_timestamp bool

Whether to parse extracted index as datetime.

False
timestamp_format str

Datetime strptime format if as_timestamp is True. Rules can be seen on https://docs.python.org/3/library/datetime.html#format-codes

None
delimiter Optional[str] = None

Field delimiter. If None (default), split on arbitrary whitespace (tabs/spaces). If your file uses a specific delimiter (e.g., comma or semicolon), pass delimiter=',' or delimiter=';', etc.

None
file_first_row {"ex","em"}

Whether the first row contains Ex or Em wavelengths.

'ex'

Returns:

Name Type Description
intensity ndarray

EEM matrix with shape (n_ex, n_em). Rows correspond to excitation wavelengths, columns to emission wavelengths. Rows correspond to excitation wavelengths in ascending order, columns to emission wavelengths in ascending order. The smallest excitation & emission wavelengths correspond to intensity[0, 0].

ex_range ndarray

Sorted excitation wavelengths (ascending).

em_range ndarray

Sorted emission wavelengths (ascending).

index str | datetime | None

Extracted index (optionally parsed as datetime).

read_eem_dataset

read_eem_dataset(folder_path: str, mandatory_keywords=None, optional_keywords=None, data_format: str = 'default', index_pos: Union[Tuple, List, None] = None, as_timestamp=False, timestamp_format=None, delimiter=None, file_first_row='ex', custom_filename_list: Union[Tuple, List, None] = None, wavelength_alignment=False, interpolation_method: str = 'linear')

Import EEM dataset from files.

Parameters:

Name Type Description Default
folder_path str

The path to the folder containing EEMs.

required
mandatory_keywords list of str

Filenames must contain all of these keywords (logical AND). Example : ["PEM", "2021-02-02"] will match only files containing both substrings.

None
optional_keywords list of str

Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords.

None
data_format str, {'default'}

Format of the EEM file. By passing 'default', the following tabular format is supported: - The first row contains excitation wavelengths (Ex). - The top left may be blank or non-numeric; any non-numeric tokens are ignored. - The first column of subsequent rows contains emission wavelengths (Em). - The remaining cells are fluorescence intensities for each (Ex, Em) pair.

Schematic layout:

<blank or label>   Ex_1   Ex_2   Ex_3   ...   Ex_n
Em_1               I_11   I_12   I_13   ...   I_1N
Em_2               I_21   I_22   I_23   ...   I_2N
...
Em_m               I_m1   I_m2   I_m3   ...   I_mn

Where I_nm is the intensity at excitation Ex_n and emission Em_m. If your files are in similar format but with the first row being Em wavelengths and the fist column being Ex wavelengths, pass file_first_row='em'.

'default'
index_pos tuple/list or None

The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "EEM_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter.

None
as_timestamp bool

Whether to read the index as timestamps.

False
timestamp_format str

Datetime strptime format if as_timestamp is True. Rules can be seen on https://docs.python.org/3/library/datetime.html#format-codes

None
delimiter Optional[str] = None

Field delimiter. If None (default), split on arbitrary whitespace (tabs/spaces). If your file uses a specific delimiter (e.g., comma or semicolon), pass delimiter=',' or delimiter=';', etc.

None
file_first_row {"ex","em"}

Whether the first row contains Ex or Em wavelengths.

'ex'
custom_filename_list list or None

If a list is passed, only the EEM files whose filenames are specified in the list will be imported.

None
wavelength_alignment bool

Align the ex/em ranges of the EEMs. This is useful if the EEMs are measured with different ex/em ranges. Note that ex/em will be aligned according to the ex/em ranges with the smallest intervals among all the imported EEMs.

False
interpolation_method str

The interpolation method used for aligning ex/em. It is only useful if wavelength_alignment=True.

'linear'

Returns:

Name Type Description
eem_stack ndarray

EEM stack with shape (n_sample, n_ex, n_em). For each EEM with shape (n_ex, n_em), rows correspond to excitation wavelengths, columns to emission wavelengths. Rows correspond to excitation wavelengths in ascending order, columns to emission wavelengths in ascending order. The smallest excitation & emission wavelengths correspond to intensity[0, 0].

ex_range ndarray

Sorted excitation wavelengths (ascending).

em_range ndarray

Sorted emission wavelengths (ascending).

indexes list of str | datetime | None

Extracted per-file indexes (optionally parsed as datetime).

read_parafac_model_from_openfluor

read_parafac_model_from_openfluor(filepath)

Import PARAFAC model from a text file written in the format suggested by OpenFluor ( https : //openfluor.lablicate.com/). Note that the models downloaded from OpenFluor normally don't have scores.

Parameters:

Name Type Description Default
filepath str

The filepath to the model file.

required

Returns:

Name Type Description
ex_df DataFrame

Excitation loadings

em_df DataFrame

Emission loadings

score_df DataFrame or None

Scores (if there's any)

info_dict dict

A dictionary containing the model information

read_parafac_models

read_parafac_models(datdir, kw)

Search all PARAFAC models in a folder by keyword in filenames and import all of them into a dictionary using read_parafac_model()

Parameters:

Name Type Description Default
datdir str

Folder containing PARAFAC model files.

required
kw str

Keyword used to filter filenames.

required

Returns:

Type Description
list of dict

A list of dicts with keys: info, ex, em, score.

read_reference_from_text

read_reference_from_text(filepath)

Read reference data from text file. The reference data can be any 1D data (e.g., dissolved organic carbon concentration). This first line of the file should be a header, and then each following line contains one datapoint, without any separators other than line breaks. For example:

    DOC (mg/L)
    1.0
    2.0
    ...
    5.0

Parameters:

Name Type Description Default
filepath str

The filepath to the reference file.

required

Returns:

Name Type Description
reference_data list of float

The reference values (one per line after the header).

header str

The header string from the first line.