radis.api.cache_files module

Tools to deal with HDF5 cache files HDF5 cache files are used to cache Energy Database files, and Line Database files, and yield a much faster access time.

Routine Listing

See also

hit2df(), cdsd2df(), -------------------------------------------------------------------------------

cache_file_name(fname, engine='pytables')[source]
check_cache_file(fcache, use_cached=True, expected_metadata={}, compare_as_close=[], verbose=True, engine='guess')[source]

Quick function that check status of cache file generated by RADIS:

The function first checks the existence of fcache. What is does depends on the value of use_cached:

  • if True, check it exists and remove the file if it is not valid.

  • if 'regen', delete cache file even if valid, to regenerate it later.

  • if 'force', raise an error if file doesnt exist.

Then look if it is deprecated (we just look at the attributes, the file is never fully read). Deprecation is done by check_not_deprecated() comparing the metadata= content.

  • if deprecated, deletes it to regenerate later unless ‘force’ was used

Parameters:
  • fcache (str) – cache file name

  • use_cached (True, False, 'force', 'regen') – see notes above. Default True.

  • expected_metadata (dict) – attributes to check

  • compare_as_close (list of keys) – compare with np.isclose(a,b) rather than a==b

  • verbose (boolean) – print stuff engine: 'h5py', 'pytables', 'vaex', 'guess'

    which HDF5 library to use. If 'guess', try to guess.

Returns:

whether the file was valid or not (and was removed). Raises a DeprecatedFileWarning for un unvalid file in mode 'force'. The error can be caught by the parent function.

Return type:

None

check_not_deprecated(file, metadata_is={}, metadata_keys_contain=[], compare_as_close=[], current_version=None, last_compatible_version='0.9.1', engine='guess')[source]

Make sure cache file is not deprecated: checks that metadata is the same, and that the version under which the file was generated is valid.

Parameters:
  • file (str) – a `` .h5`` cache file for Energy Levels

  • metadata_is (dict) – expected values for these variables in the file metadata. If the values dont match, a DeprecatedFileWarning() error is raised. If the file metadata contains additional keys/values, no error is raised.

  • metadata_keys_contain (list) – expected list of variables in the file metadata. If the keys are not there, a DeprecatedFileWarning() error is raised.

  • compare_as_close (list of keys) – compare with np.isclose(a,b) rather than a==b

Other Parameters:
  • current_version (str, or None) – current version number. If the file was generated in a previous version a warning is raised. If None, current version is read from radis.__version__.

  • last_backward_compatible_version (str) – If the file was generated in a non-compatible version, an error is raised. (useful parameter to force regeneration of certain cache files after a

    breaking change in a new version)

  • engine ('h5py', 'pytables', 'vaex', 'guess') – which HDF5 library to use. If 'guess', try to guess.

check_relevancy(file, relevant_if_metadata_above, relevant_if_metadata_below, verbose=True, key='default', engine='guess')[source]

Make sure cache file is relevant.

Use case: checks that wavenumber min and wavenumber max in metadata are relevant for the specified spectral range.

Parameters:
  • file (str) – a `` .h5`` line database cache file

  • load_only_wavenum_above, relevant_if_metadata_below (dict) – only load the cached file if the metadata values are above/below the specific values for each key.

  • relevant_if_metadata_above, relevant_if_metadata_below (dict) – file is relevant if the file metadata value for each key of the dictionary is above/below the value in the dictionary

Other Parameters:
  • key (str) – dataset key in storer.

  • engine ('h5py', 'pytables', 'vaex', 'guess') – which HDF5 library to use. If 'guess', try to guess.

Examples

You want to compute a spectrum in between 2300 and 2500 cm-1. A line database file is relevant only if its metadata says that 'wavenum_max' > 2300 and 'wavenum_min' < 2500 cm-1.

check_relevancy(‘path/to/file’, relevant_if_metadata_above={‘wavenum_max’:2300},

relevant_if_metadata_below={‘wavenum_min’:2500})

the specified value.

filter_metadata(arguments, discard_variables=['self', 'verbose'])[source]

Filter arguments (created with locals() at the beginning of the script) to extract metadata.

Metadata is stored as attributes in the cached file:

  • remove variables in discard_variables

  • remove variables that start with '_'

  • remove variables whose value is None

Parameters:
  • arguments (dict) –

    list of local variables. For instance:

    arguments = locals()
    
  • discard_variables (list of str) – variable names to discard

Returns:

metadata – a (new) dictionary built from arguments by removing discard_variables and variables starting with '_'

Return type:

dict

Examples

How to get only function argument:

def some_function(*args):
    metadata = locals()     # stores only function arguments because it's the first line

    ...

    metadata = filter_metadata(metadata)
    save_to_hdf(df, fname, metadata=metadata)

    ...
get_cache_file(fcache, engine='pytables', verbose=True)[source]

Load HDF5 cache file.

Parameters:

fcache (str) – file name

Other Parameters:

verbose (bool) – If >=2, also warns if non numeric values are present (it would make calculations slower)

Notes

we could start using FEATHER format instead. See notes in cache_files.py

load_h5_cache_file(cachefile, use_cached, columns=None, valid_if_metadata_is={}, relevant_if_metadata_above={}, relevant_if_metadata_below={}, current_version='', last_compatible_version='0.9.1', verbose=True, engine='pytables')[source]

Function to load a h5 cache file.

Parameters:
  • cachefile (str) – cache file path

  • use_cached (str) – use cache file if value is not False:

    • if True, use (and generate if doesnt exist) cache file.

    • if 'regen', delete cache file (if exists) so it is regenerated

    • if 'force', use cache file and raises an error if it doesnt exist

    if using the cache file, check if the file is deprecated. If it is deprecated, regenerate the file unless 'force' was used (in that case, raise an error)

  • columns (list, or None) – columns to load

  • valid_if_metadata_is (dict) – values are compared to cache file attributes. If they dont match, the file is considered deprecated. See use_cached to know how to handle deprecated files

    Note

    if the file has extra attributes they are not compared

  • current_version (str) – version is compared to cache file version (part of attributes). If current version is superior, a simple warning is triggered.

  • last_compatible_version (str) – if file version is inferior to this, file is considered deprecated. See use_cached to know how to handle deprecated files.

  • relevant_if_metadata_above, relevant_if_metadata_below (dict) – values are compared to cache file attributes. If they don’t match, the function returns a IrrelevantFileWarning. For instance, load a line database file, only if it contains wavenumbers between 2300 and 2500 cm-1

    load_h5_cache_file(..., relevant_if_metadata_above={'wav':2300};
    relevant_if_metadata_below={'wav':2500})
    

    Note that in such an example, the file data is not read. Only the file metadata is. If the metadata does not contain the key (e.g.: 'wav') a DeprecatedFileWarning is raised.

Returns:

df – None if no cache file was found, or if it was deleted

Return type:

pandas DataFrame or Vaex Dataframe or None

save_to_hdf(df, fname, metadata, version=None, key='default', overwrite=True, verbose=True, engine='pytables')[source]

Save energy levels or lines to HDF5 file. Add metadata and version.

df: a pandas/vaex DataFrame

data will be stored in this key.

fname: str

.h5 file where to store.

metadata: dict

dictionary of values that were used to generate the DataFrame. Metadata will be asked again on file load to ensure it hasnt changed. None values are not stored.

version: str, or None

file version. If None, the current radis.__version__ is used. On file loading, a warning will be raised if the current version is posterior, or an error if the file version is set to be uncompatible.

key: str

dataset name. Default 'df'

overwrite: boolean

if True, overwrites file. Else, raise an error if it exists.

verbose: bool

If >=2, also warns if non numeric values are present (it would make calculations slower)

engine: 'h5py', 'pytables', 'vaex', 'pytables-fixed'

which HDF5 library to use. Note: 'vaex' uses 'h5py' compatible HDF5. Default pytables

None values are not stored