radis.api.cache_files module¶
Tools to deal with HDF5 cache files HDF5 cache files are used to cache Energy Database files, and Line Database files, and yield a much faster access time.
Routine Listing¶
- check_cache_file(fcache, use_cached=True, expected_metadata={}, compare_as_close=[], verbose=True, engine='guess')[source]¶
Quick function that check status of cache file generated by RADIS:
The function first checks the existence of
fcache
. What is does depends on the value ofuse_cached
:if
True
, check it exists and remove the file if it is not valid.if
'regen'
, delete cache file even if valid, to regenerate it later.if
'force'
, raise an error if file doesnt exist.
Then look if it is deprecated (we just look at the attributes, the file is never fully read). Deprecation is done by
check_not_deprecated()
comparing themetadata=
content.if deprecated, deletes it to regenerate later unless ‘force’ was used
- Parameters:
fcache (str) – cache file name
use_cached (
True
,False
,'force'
,'regen'
) – see notes above. DefaultTrue
.expected_metadata (dict) – attributes to check
compare_as_close (list of keys) – compare with
np.isclose(a,b)
rather thana==b
verbose (boolean) – print stuff engine:
'h5py'
,'pytables'
,'vaex'
,'guess'
which HDF5 library to use. If
'guess'
, try to guess.
- Returns:
whether the file was valid or not (and was removed). Raises a
DeprecatedFileWarning
for un unvalid file in mode'force'
. The error can be caught by the parent function.- Return type:
None
See also
- check_not_deprecated(file, metadata_is={}, metadata_keys_contain=[], compare_as_close=[], current_version=None, last_compatible_version='0.9.1', engine='guess')[source]¶
Make sure cache file is not deprecated: checks that
metadata
is the same, and that the version under which the file was generated is valid.- Parameters:
file (str) – a `` .h5`` cache file for Energy Levels
metadata_is (dict) – expected values for these variables in the file metadata. If the values dont match, a
DeprecatedFileWarning()
error is raised. If the file metadata contains additional keys/values, no error is raised.metadata_keys_contain (list) – expected list of variables in the file metadata. If the keys are not there, a
DeprecatedFileWarning()
error is raised.compare_as_close (list of keys) – compare with
np.isclose(a,b)
rather thana==b
- Other Parameters:
current_version (str, or
None
) – current version number. If the file was generated in a previous version a warning is raised. IfNone
, current version is read fromradis.__version__
.last_backward_compatible_version (str) – If the file was generated in a non-compatible version, an error is raised. (useful parameter to force regeneration of certain cache files after a
breaking change in a new version)
engine (
'h5py'
,'pytables'
,'vaex'
,'guess'
) – which HDF5 library to use. If'guess'
, try to guess.
- check_relevancy(file, relevant_if_metadata_above, relevant_if_metadata_below, verbose=True, key='default', engine='guess')[source]¶
Make sure cache file is relevant.
Use case: checks that wavenumber min and wavenumber max in
metadata
are relevant for the specified spectral range.- Parameters:
file (str) – a `` .h5`` line database cache file
load_only_wavenum_above, relevant_if_metadata_below (dict) – only load the cached file if the metadata values are above/below the specific values for each key.
relevant_if_metadata_above, relevant_if_metadata_below (dict) – file is relevant if the file metadata value for each key of the dictionary is above/below the value in the dictionary
- Other Parameters:
key (str) – dataset key in storer.
engine (
'h5py'
,'pytables'
,'vaex'
,'guess'
) – which HDF5 library to use. If'guess'
, try to guess.
Examples
You want to compute a spectrum in between 2300 and 2500 cm-1. A line database file is relevant only if its metadata says that
'wavenum_max' > 2300
and'wavenum_min'
< 2500 cm-1.- check_relevancy(‘path/to/file’, relevant_if_metadata_above={‘wavenum_max’:2300},
relevant_if_metadata_below={‘wavenum_min’:2500})
the specified value.
- filter_metadata(arguments, discard_variables=['self', 'verbose'])[source]¶
Filter arguments (created with
locals()
at the beginning of the script) to extract metadata.Metadata is stored as attributes in the cached file:
remove variables in
discard_variables
remove variables that start with
'_'
remove variables whose value is
None
- Parameters:
arguments (dict) –
list of local variables. For instance:
arguments = locals()
discard_variables (list of str) – variable names to discard
- Returns:
metadata – a (new) dictionary built from arguments by removing
discard_variables
and variables starting with'_'
- Return type:
dict
Examples
How to get only function argument:
def some_function(*args): metadata = locals() # stores only function arguments because it's the first line ... metadata = filter_metadata(metadata) save_to_hdf(df, fname, metadata=metadata) ...
- get_cache_file(fcache, engine='pytables', verbose=True)[source]¶
Load HDF5 cache file.
- Parameters:
fcache (str) – file name
- Other Parameters:
verbose (bool) – If >=2, also warns if non numeric values are present (it would make calculations slower)
Notes
we could start using FEATHER format instead. See notes in cache_files.py
- load_h5_cache_file(cachefile, use_cached, columns=None, valid_if_metadata_is={}, relevant_if_metadata_above={}, relevant_if_metadata_below={}, current_version='', last_compatible_version='0.9.1', verbose=True, engine='pytables')[source]¶
Function to load a h5 cache file.
- Parameters:
cachefile (str) – cache file path
use_cached (str) – use cache file if value is not
False
:if
True
, use (and generate if doesnt exist) cache file.if
'regen'
, delete cache file (if exists) so it is regeneratedif
'force'
, use cache file and raises an error if it doesnt exist
if using the cache file, check if the file is deprecated. If it is deprecated, regenerate the file unless
'force'
was used (in that case, raise an error)columns (list, or
None
) – columns to loadvalid_if_metadata_is (dict) – values are compared to cache file attributes. If they dont match, the file is considered deprecated. See
use_cached
to know how to handle deprecated filesNote
if the file has extra attributes they are not compared
current_version (str) – version is compared to cache file version (part of attributes). If current version is superior, a simple warning is triggered.
last_compatible_version (str) – if file version is inferior to this, file is considered deprecated. See
use_cached
to know how to handle deprecated files.relevant_if_metadata_above, relevant_if_metadata_below (dict) – values are compared to cache file attributes. If they don’t match, the function returns a
IrrelevantFileWarning
. For instance, load a line database file, only if it contains wavenumbers between 2300 and 2500 cm-1load_h5_cache_file(..., relevant_if_metadata_above={'wav':2300}; relevant_if_metadata_below={'wav':2500})
Note that in such an example, the file data is not read. Only the file metadata is. If the metadata does not contain the key (e.g.:
'wav'
) aDeprecatedFileWarning
is raised.
- Returns:
df – None if no cache file was found, or if it was deleted
- Return type:
pandas DataFrame or Vaex Dataframe or None
- save_to_hdf(df, fname, metadata, version=None, key='default', overwrite=True, verbose=True, engine='pytables')[source]¶
Save energy levels or lines to HDF5 file. Add metadata and version.
- df: a pandas/vaex DataFrame
data will be stored in this key.
- fname: str
.h5
file where to store.- metadata: dict
dictionary of values that were used to generate the DataFrame. Metadata will be asked again on file load to ensure it hasnt changed.
None
values are not stored.- version: str, or
None
file version. If
None
, the currentradis.__version__
is used. On file loading, a warning will be raised if the current version is posterior, or an error if the file version is set to be uncompatible.- key: str
dataset name. Default
'df'
- overwrite: boolean
if
True
, overwrites file. Else, raise an error if it exists.
- verbose: bool
If >=2, also warns if non numeric values are present (it would make calculations slower)
- engine:
'h5py'
,'pytables'
,'vaex'
,'pytables-fixed'
which HDF5 library to use. Note:
'vaex'
uses'h5py'
compatible HDF5. Defaultpytables
None
values are not stored