read_data
eempy.read_data.read_data
Functions for importing files Author: Yongmin Hu (yongminhu@outlook.com) Last update: 2026-01
get_filelist
get_filelist(folderpath, mandatory_keywords, optional_keywords)
Get a list containing all filenames with given keywords in a folder. A filename must contain all mandatory keywords and at least one of the optional keywords.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folderpath
|
str
|
Folder path to search. |
required |
mandatory_keywords
|
str or list of str
|
Filenames must contain all of these keywords (logical AND). |
required |
optional_keywords
|
str or list of str
|
Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords. |
required |
Returns:
| Type | Description |
|---|---|
list of str
|
Filenames matching the keyword filters. |
read_abs
read_abs(file_path, index_pos: Union[Tuple, List, None] = None, data_format='default')
Import UV absorbance data from UV absorbance file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
The filepath to the UV absorbance file |
required |
index_pos
|
None or tuple with two elements
|
The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "EEM_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter. |
None
|
data_format
|
str
|
Format of the UV absorbance file. By passing
Where A_i correspond to the absorbance at wavelength Ex_i |
'default'
|
Returns:
| Name | Type | Description |
|---|---|---|
absorbance |
ndarray
|
The UV absorbance spectrum (1D array). |
ex_range |
ndarray
|
The excitation wavelengths (1D array). |
index |
str
|
The index of the Absorbance spectrum. |
read_abs_dataset
read_abs_dataset(folder_path, mandatory_keywords='ABS', optional_keywords=[], data_format: str = 'default', index_pos: Union[Tuple, List, None] = None, custom_filename_list: Union[Tuple, List, None] = None, wavelength_alignment=False, interpolation_method: str = 'linear')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder_path
|
str
|
The path to the folder containing absorbance files. |
required |
mandatory_keywords
|
list of str
|
Filenames must contain all of these keywords (logical AND). Example : ["ABS", "2021-02-02"] will match only files containing both substrings. |
'ABS'
|
optional_keywords
|
list of str
|
Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords. |
[]
|
data_format
|
str
|
Specify the type of absorbance data format. |
'default'
|
index_pos
|
tuple/list or None
|
The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "ABS_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter. |
None
|
custom_filename_list
|
list or None
|
If a list is passed, only the absorbance files whose filenames are specified in the list will be imported. |
None
|
wavelength_alignment
|
bool
|
Align the ex range of the absorbance files. This is useful if the absorbance are measured with different ex range. Note that ex will be aligned according to the ex ranges with the smallest intervals among all the imported absorbance files. |
False
|
interpolation_method
|
str
|
The interpolation method used for aligning ex. It is only useful if wavelength_alignment=True. |
'linear'
|
Returns:
| Name | Type | Description |
|---|---|---|
abs_stack |
ndarray
|
A stack of imported absorbance files (n_samples, n_wavelengths). |
ex_range |
ndarray
|
The excitation wavelengths (1D array). |
indexes |
list or None
|
The list of absorbance file indexes (if index_pos is specified). |
read_eem
read_eem(file_path: str, index_pos: Union[Tuple[int, int], List[int], None] = None, data_format: str = 'default', as_timestamp: bool = False, timestamp_format: Optional[str] = None, delimiter: Optional[str] = None, file_first_row: str = 'ex')
Import EEM from file (tabular format).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Filepath to the EEM file. |
required |
index_pos
|
None or tuple/list with two elements
|
Start/end positions of index in filename (1-based start, end inclusive as in original code intent). Example : (4, 13) extracts basename[3:13]. |
None
|
data_format
|
str, {'default'}
|
Format of the EEM file. By passing Schematic layout:
Where I_nm is the intensity at excitation Ex_n and emission Em_m.
If your files are in similar format but with the first row being Em wavelengths and the fist column being Ex wavelengths, pass |
'default'
|
as_timestamp
|
bool
|
Whether to parse extracted index as datetime. |
False
|
timestamp_format
|
str
|
Datetime strptime format if as_timestamp is True. Rules can be seen on https://docs.python.org/3/library/datetime.html#format-codes |
None
|
delimiter
|
Optional[str] = None
|
Field delimiter. If None (default), split on arbitrary whitespace (tabs/spaces). If your file uses a specific
delimiter (e.g., comma or semicolon), pass |
None
|
file_first_row
|
{"ex","em"}
|
Whether the first row contains Ex or Em wavelengths. |
'ex'
|
Returns:
| Name | Type | Description |
|---|---|---|
intensity |
ndarray
|
EEM matrix with shape (n_ex, n_em). Rows correspond to excitation wavelengths, columns to emission wavelengths. Rows correspond to excitation wavelengths in ascending order, columns to emission wavelengths in ascending order. The smallest excitation & emission wavelengths correspond to intensity[0, 0]. |
ex_range |
ndarray
|
Sorted excitation wavelengths (ascending). |
em_range |
ndarray
|
Sorted emission wavelengths (ascending). |
index |
str | datetime | None
|
Extracted index (optionally parsed as datetime). |
read_eem_dataset
read_eem_dataset(folder_path: str, mandatory_keywords=None, optional_keywords=None, data_format: str = 'default', index_pos: Union[Tuple, List, None] = None, as_timestamp=False, timestamp_format=None, delimiter=None, file_first_row='ex', custom_filename_list: Union[Tuple, List, None] = None, wavelength_alignment=False, interpolation_method: str = 'linear')
Import EEM dataset from files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder_path
|
str
|
The path to the folder containing EEMs. |
required |
mandatory_keywords
|
list of str
|
Filenames must contain all of these keywords (logical AND). Example : ["PEM", "2021-02-02"] will match only files containing both substrings. |
None
|
optional_keywords
|
list of str
|
Filenames must contain at least one of these keywords (logical OR). If empty or None, no additional filtering is applied beyond mandatory_keywords. |
None
|
data_format
|
str, {'default'}
|
Format of the EEM file. By passing Schematic layout:
Where I_nm is the intensity at excitation Ex_n and emission Em_m.
If your files are in similar format but with the first row being Em wavelengths and the fist column being Ex wavelengths, pass |
'default'
|
index_pos
|
tuple/list or None
|
The starting and ending positions of index in filenames. For example, if you want to read the index "2024_01_01" from the file with the name "EEM_2024_01_01_PEM.dat", a tuple (4, 13) should be passed to this parameter. |
None
|
as_timestamp
|
bool
|
Whether to read the index as timestamps. |
False
|
timestamp_format
|
str
|
Datetime strptime format if as_timestamp is True. Rules can be seen on https://docs.python.org/3/library/datetime.html#format-codes |
None
|
delimiter
|
Optional[str] = None
|
Field delimiter. If None (default), split on arbitrary whitespace (tabs/spaces). If your file uses a specific
delimiter (e.g., comma or semicolon), pass |
None
|
file_first_row
|
{"ex","em"}
|
Whether the first row contains Ex or Em wavelengths. |
'ex'
|
custom_filename_list
|
list or None
|
If a list is passed, only the EEM files whose filenames are specified in the list will be imported. |
None
|
wavelength_alignment
|
bool
|
Align the ex/em ranges of the EEMs. This is useful if the EEMs are measured with different ex/em ranges. Note that ex/em will be aligned according to the ex/em ranges with the smallest intervals among all the imported EEMs. |
False
|
interpolation_method
|
str
|
The interpolation method used for aligning ex/em. It is only useful if wavelength_alignment=True. |
'linear'
|
Returns:
| Name | Type | Description |
|---|---|---|
eem_stack |
ndarray
|
EEM stack with shape (n_sample, n_ex, n_em). For each EEM with shape (n_ex, n_em), rows correspond to excitation wavelengths, columns to emission wavelengths. Rows correspond to excitation wavelengths in ascending order, columns to emission wavelengths in ascending order. The smallest excitation & emission wavelengths correspond to intensity[0, 0]. |
ex_range |
ndarray
|
Sorted excitation wavelengths (ascending). |
em_range |
ndarray
|
Sorted emission wavelengths (ascending). |
indexes |
list of str | datetime | None
|
Extracted per-file indexes (optionally parsed as datetime). |
read_parafac_model_from_openfluor
read_parafac_model_from_openfluor(filepath)
Import PARAFAC model from a text file written in the format suggested by OpenFluor ( https : //openfluor.lablicate.com/). Note that the models downloaded from OpenFluor normally don't have scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
The filepath to the model file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ex_df |
DataFrame
|
Excitation loadings |
em_df |
DataFrame
|
Emission loadings |
score_df |
DataFrame or None
|
Scores (if there's any) |
info_dict |
dict
|
A dictionary containing the model information |
read_parafac_models
read_parafac_models(datdir, kw)
Search all PARAFAC models in a folder by keyword in filenames and import all of them into a dictionary using read_parafac_model()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datdir
|
str
|
Folder containing PARAFAC model files. |
required |
kw
|
str
|
Keyword used to filter filenames. |
required |
Returns:
| Type | Description |
|---|---|
list of dict
|
A list of dicts with keys: |
read_reference_from_text
read_reference_from_text(filepath)
Read reference data from text file. The reference data can be any 1D data (e.g., dissolved organic carbon concentration). This first line of the file should be a header, and then each following line contains one datapoint, without any separators other than line breaks. For example:
DOC (mg/L)
1.0
2.0
...
5.0
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
The filepath to the reference file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
reference_data |
list of float
|
The reference values (one per line after the header). |
header |
str
|
The header string from the first line. |