radis.api.hdf5 module¶

Defines the DataFileManager class

class DataFileManager(engine=None)[source]¶

Bases: object

add_metadata(fname: str, metadata: dict, key='default', create_empty_dataset=False)[source]¶

Parameters:

fname (str) – filename
metadata (dict) – dictionary of metadata to add in group key
key (str) – group to add metadata to. If None, add at root level. If 'default', use engine’s default (/table for 'vaex', df for pytables, root for h5py )

Other Parameters:

create_empty_dataset (bool) – if True, create an empty dataset to store the metadata as attribute

cache_file(fname)[source]¶

Return the corresponding cache file name for fname.

Other Parameters:: engine ('h5py', 'pytables', 'vaex') – which HDF5 library to use. Default pytables

combine_temp_batch_files(file, key='default', sort_values=None, delete_nan_columns=True)[source]¶: Combine all batch files in self._temp_batch_files into one. Removes all batch files.

get_columns(local_file)[source]¶: Get all columns (without loading all Dataframe)

classmethod guess_engine(file, verbose=True)[source]¶

Guess which HDF5 library file is compatible with

Note

it still take about 1 ms for this function to execute. For extreme performance you want to directly give the correct engine

Examples

file = 'CO.hdf5'
from radis.io.hdf5 import HDF5Manager
engine = HDF5Manager.guess_engine(file)
mgr = HDF5Manager(engine)
mgr.read_metadata(file)

has_nan(column)[source]¶

load(fname, columns=None, lower_bound=[], upper_bound=[], within=[], output='pandas', **store_kwargs)[source]¶

Other Parameters:

columns (list of str) – list of columns to load. If None, returns all columns in the file.
output (‘pandas’, ‘vaex’, ‘jax’) – format of the output DataFrame. If 'jax', returns a dictionary of jax arrays.
lower_bound (list of tuples [(column, lower_bound), etc.]) –
```
lower_bound =[("wav", load_wavenum_min)]
```
upper_bound_bound (list of tuples [(column, upper_bound), etc.]) –
```
upper_bound=[("wav", load_wavenum_max)]
```
within (list of tuples [(column, within_list), etc.]) –
```
within=[("iso", isotope.split(","))]
```

open(file, mode='w')[source]¶

read(fname, columns=None, where=None, key='default', none_if_empty=False, **store_kwargs)[source]¶

Parameters:

fname (str)
columns (list of str) – list of columns to load. If None, returns all columns in the file.
where (list of str) –

filtering conditions. Ex:
```
"wav > 2300"
```

Other Parameters:

key (str) – group to load from. If None, load from root level. If 'default', use engine’s default (/table for 'vaex', df for pytables, root for h5py )

Return type:

pd.DataFrame or vaex.DataFrame

read_filter(fname, columns=None, lower_bound=[], upper_bound=[], within=[], **store_kwargs)[source]¶

Parameters:

fname (str)
columns (list of str) – list of columns to load. If None, returns all columns in the file.
lower_bound (list of tuples [(column, lower_bound), etc.]) –
```
lower_bound =[("wav", load_wavenum_min)]
```
upper_bound_bound (list of tuples [(column, upper_bound), etc.]) –
```
upper_bound=[("wav", load_wavenum_max)]
```
within (list of tuples [(column, within_list), etc.]) –
```
within=[("iso", isotope.split(","))]
```

read_metadata(fname: str, key='default') → dict[source]¶

Other Parameters:: key (str) – group where to read metadata from. If None, add at root level. If 'default', use engine’s default (/table for 'vaex', df for pytables, root for h5py )

to_numpy(df)[source]¶: Convert DataFrame to numpy

write(file, df, append=False, key='default', format='table', data_columns=['iso', 'wav', 'nu_lines'])[source]¶

Write dataframe df to file

Parameters:

df (DataFrame)

Other Parameters:

key (str) – group to write to. If None, write at root level. If 'default', use engine’s default (/table for 'vaex', df for pytables, root for h5py )
data_columns (list) – only these column names will be searchable directly on disk to load certain lines only. See hdf2df()

class HDF5Manager(**kwargs)[source]¶: Bases: object

hdf2df(fname, columns=None, isotope=None, load_wavenum_min=None, load_wavenum_max=None, verbose=True, store_kwargs={}, engine='guess', output='pandas')[source]¶

Load a HDF5 line databank into a Pandas DataFrame.

Adds HDF5 metadata in df.attrs

Parameters:

fname (str) – HDF5 file name
columns (list of str) – list of columns to load. If None, returns all columns in the file.
isotope (str) – load only certain isotopes : '2', '1,2', etc. If None, loads everything. Default None.
load_wavenum_min, load_wavenum_max (float (cm-1)) – load only specific wavelength.

Other Parameters:

store_kwargs (dict) – arguments forwarded to read_hdf()
engine ('h5py', 'pytables', 'vaex', 'auto') – which HDF5 library to use. If 'guess', try to guess. Note: 'vaex' uses 'h5py' compatible HDF5.
output (‘pandas’, ‘vaex’, ‘jax’) – format of the output DataFrame. If 'jax', returns a dictionary of jax arrays.

Returns:

df – dataframe containing all lines or energy levels

Return type:

pandas Dataframe, or vaex DataFrameLocal, or dictionary of Jax arrays

Examples

path = getDatabankEntries("HITEMP-OH")['path'][0]
df = hdf2df(path)

df = hdf2df(path, columns=['wav', 'int'])

df = hdf2df(path, isotope='2')
df = hdf2df(path, isotope='1,2)

df = hdf2df(path, load_wavenum_min=2300, load_wavenum_max=2500)

Notes

DataFrame metadata in df.attrs is still experimental in Pandas and can be lost during groupby, pivot, join or loc operations on the Dataframe. See https://stackoverflow.com/questions/14688306/adding-meta-information-metadata-to-pandas-dataframe

Always check for existence !

update_pytables_to_vaex(fname, remove_initial=False, verbose=True, key='df')[source]¶: Convert a HDF5 file generated from PyTables to a Vaex-friendly HDF5 format, preserving metadata

vaexsafe_colname(name)[source]¶: replace ‘/’ (forbidden in HDF5 vaex column names with ‘_’ https://github.com/radis/radis/issues/473 https://github.com/vaexio/vaex/issues/1255

RADIS

Navigation

Quick search

radis.api.hdf5 module¶