radis.io.hdf5 module

Created on Tue Jan 26 21:27:15 2021

@author: erwan

class DataFileManager(engine=None)[source]

Bases: object

add_metadata(fname: str, metadata: dict, key='default', create_empty_dataset=False)[source]
Parameters
  • fname (str) – filename

  • metadata (dict) – dictionary of metadata to add in group key

  • key (str) – group to add metadata to. If None, add at root level. If 'default', use engine’s default (/table for 'vaex', df for pytables, root for h5py )

Other Parameters

create_empty_dataset (bool) – if True, create an empty dataset to store the metadata as attribute

cache_file(fname)[source]

Return the corresponding cache file name for fname.

Other Parameters

engine ('h5py', 'pytables', 'vaex') – which HDF5 library to use. Default pytables

combine_temp_batch_files(file, key='default', sort_values=None, delete_nan_columns=True)[source]

Combine all batch files in self._temp_batch_files into one. Removes all batch files.

get_columns(local_file)[source]

Get all columns (without loading all Dataframe)

classmethod guess_engine(file, verbose=True)[source]

Guess which HDF5 library file is compatible with

Note

it still take about 1 ms for this functino to execute. For extreme performance you want to directly give the correct engine

Examples

file = 'CO.hdf5'
from radis.io.hdf5 import HDF5Manager
engine = HDF5Manager.guess_engine(file)
mgr = HDF5Manager(engine)
mgr.read_metadata(file)
has_nan(column)[source]
load(fname, columns=None, lower_bound=[], upper_bound=[], within=[], output='pandas', **store_kwargs)[source]
Other Parameters
  • columns (list of str) – list of columns to load. If None, returns all columns in the file.

  • output (‘pandas’, ‘vaex’, ‘jax’) – format of the output DataFrame. If 'jax', returns a dictionary of jax arrays.

  • lower_bound (list of tuples [(column, lower_bound), etc.]) –

    lower_bound =[("wav", load_wavenum_min)]
    
  • upper_bound_bound (list of tuples [(column, upper_bound), etc.]) –

    upper_bound=[("wav", load_wavenum_max)]
    
  • within (list of tuples [(column, within_list), etc.]) –

    within=[("iso", isotope.split(","))]
    
open(file, mode='w')[source]
read(fname, columns=None, where=None, key='default', none_if_empty=False, **store_kwargs)[source]
Parameters
  • fname (str)

  • columns (list of str) – list of columns to load. If None, returns all columns in the file.

  • where (list of str) –

    filtering conditions. Ex:

    "wav > 2300"
    
Other Parameters

key (str) – group to load from. If None, load from root level. If 'default', use engine’s default (/table for 'vaex', df for pytables, root for h5py )

Return type

pd.DataFrame or vaex.DataFrame

read_filter(fname, columns=None, lower_bound=[], upper_bound=[], within=[], **store_kwargs)[source]
Parameters
  • fname (str)

  • columns (list of str) – list of columns to load. If None, returns all columns in the file.

  • lower_bound (list of tuples [(column, lower_bound), etc.]) –

    lower_bound =[("wav", load_wavenum_min)]
    
  • upper_bound_bound (list of tuples [(column, upper_bound), etc.]) –

    upper_bound=[("wav", load_wavenum_max)]
    
  • within (list of tuples [(column, within_list), etc.]) –

    within=[("iso", isotope.split(","))]
    
read_metadata(fname: str, key='default') dict[source]
Other Parameters

key (str) – group where to read metadat from. If None, add at root level. If 'default', use engine’s default (/table for 'vaex', df for pytables, root for h5py )

to_numpy(df)[source]

Convert DataFrame to numpy

write(file, df, append=False, key='default', format='table', data_columns=['iso', 'wav', 'nu_lines'])[source]

Write dataframe df to file

Parameters

df (DataFrame)

Other Parameters
  • key (str) – group to write to. If None, write at root level. If 'default', use engine’s default (/table for 'vaex', df for pytables, root for h5py )

  • data_columns (list) – only these column names will be searchable directly on disk to load certain lines only. See hdf2df()

class HDF5Manager(**kwargs)[source]

Bases: object

hdf2df(fname, columns=None, isotope=None, load_wavenum_min=None, load_wavenum_max=None, verbose=True, store_kwargs={}, engine='guess', output='pandas')[source]

Load a HDF5 line databank into a Pandas DataFrame.

Adds HDF5 metadata in df.attrs

Parameters
  • fname (str) – HDF5 file name

  • columns (list of str) – list of columns to load. If None, returns all columns in the file.

  • isotope (str) – load only certain isotopes : '2', '1,2', etc. If None, loads everything. Default None.

  • load_wavenum_min, load_wavenum_max (float (cm-1)) – load only specific wavelength.

Other Parameters
  • store_kwargs (dict) – arguments forwarded to read_hdf()

  • engine ('h5py', 'pytables', 'vaex', 'auto') – which HDF5 library to use. If 'guess', try to guess. Note: 'vaex' uses 'h5py' compatible HDF5.

  • output (‘pandas’, ‘vaex’, ‘jax’) – format of the output DataFrame. If 'jax', returns a dictionary of jax arrays.

Returns

df – dataframe containing all lines or energy levels

Return type

pandas Dataframe, or vaex DataFrameLocal, or dictionary of Jax arrays

Examples

path = getDatabankEntries("HITEMP-OH")['path'][0]
df = hdf2df(path)

df = hdf2df(path, columns=['wav', 'int'])

df = hdf2df(path, isotope='2')
df = hdf2df(path, isotope='1,2)

df = hdf2df(path, load_wavenum_min=2300, load_wavenum_max=2500)

Notes

DataFrame metadata in df.attrs is still experimental in Pandas and can be lost during groupby, pivot, join or loc operations on the Dataframe. See https://stackoverflow.com/questions/14688306/adding-meta-information-metadata-to-pandas-dataframe

Always check for existence !

update_pytables_to_vaex(fname, remove_initial=False, verbose=True, key='df')[source]

Convert a HDF5 file generated from PyTables to a Vaex-friendly HDF5 format, preserving metadata

vaexsafe_colname(name)[source]

replace ‘/’ (forbidden in HDF5 vaex column names with ‘_’ https://github.com/radis/radis/issues/473 https://github.com/vaexio/vaex/issues/1255