radis.api.hdf5 module¶
Defines the DataFileManager
class
- class DataFileManager(engine=None)[source]¶
Bases:
object
- add_metadata(fname: str, metadata: dict, key='default', create_empty_dataset=False)[source]¶
- Parameters:
fname (str) – filename
metadata (dict) – dictionary of metadata to add in group
key
key (str) – group to add metadata to. If
None
, add at root level. If'default'
, use engine’s default (/table
for'vaex'
,df
forpytables
, root forh5py
)
- Other Parameters:
create_empty_dataset (bool) – if True, create an empty dataset to store the metadata as attribute
- cache_file(fname)[source]¶
Return the corresponding cache file name for fname.
- Other Parameters:
engine (
'h5py'
,'pytables'
,'vaex'
) – which HDF5 library to use. Defaultpytables
- combine_temp_batch_files(file, key='default', sort_values=None, delete_nan_columns=True)[source]¶
Combine all batch files in
self._temp_batch_files
into one. Removes all batch files.
- classmethod guess_engine(file, verbose=True)[source]¶
Guess which HDF5 library
file
is compatible withNote
it still take about 1 ms for this function to execute. For extreme performance you want to directly give the correct engine
Examples
file = 'CO.hdf5' from radis.io.hdf5 import HDF5Manager engine = HDF5Manager.guess_engine(file) mgr = HDF5Manager(engine) mgr.read_metadata(file)
- load(fname, columns=None, lower_bound=[], upper_bound=[], within=[], output='pandas', **store_kwargs)[source]¶
- Other Parameters:
columns (list of str) – list of columns to load. If
None
, returns all columns in the file.output (‘pandas’, ‘vaex’, ‘jax’) – format of the output DataFrame. If
'jax'
, returns a dictionary of jax arrays.lower_bound (list of tuples [(column, lower_bound), etc.]) –
lower_bound =[("wav", load_wavenum_min)]
upper_bound_bound (list of tuples [(column, upper_bound), etc.]) –
upper_bound=[("wav", load_wavenum_max)]
within (list of tuples [(column, within_list), etc.]) –
within=[("iso", isotope.split(","))]
- read(fname, columns=None, where=None, key='default', none_if_empty=False, **store_kwargs)[source]¶
- Parameters:
fname (str)
columns (list of str) – list of columns to load. If
None
, returns all columns in the file.where (list of str) –
filtering conditions. Ex:
"wav > 2300"
- Other Parameters:
key (str) – group to load from. If
None
, load from root level. If'default'
, use engine’s default (/table
for'vaex'
,df
forpytables
, root forh5py
)- Return type:
pd.DataFrame or vaex.DataFrame
- read_filter(fname, columns=None, lower_bound=[], upper_bound=[], within=[], **store_kwargs)[source]¶
- Parameters:
fname (str)
columns (list of str) – list of columns to load. If
None
, returns all columns in the file.lower_bound (list of tuples [(column, lower_bound), etc.]) –
lower_bound =[("wav", load_wavenum_min)]
upper_bound_bound (list of tuples [(column, upper_bound), etc.]) –
upper_bound=[("wav", load_wavenum_max)]
within (list of tuples [(column, within_list), etc.]) –
within=[("iso", isotope.split(","))]
- read_metadata(fname: str, key='default') dict [source]¶
- Other Parameters:
key (str) – group where to read metadata from. If
None
, add at root level. If'default'
, use engine’s default (/table
for'vaex'
,df
forpytables
, root forh5py
)
- write(file, df, append=False, key='default', format='table', data_columns=['iso', 'wav', 'nu_lines'])[source]¶
Write dataframe
df
tofile
- Parameters:
df (DataFrame)
- Other Parameters:
key (str) – group to write to. If
None
, write at root level. If'default'
, use engine’s default (/table
for'vaex'
,df
forpytables
, root forh5py
)data_columns (list) – only these column names will be searchable directly on disk to load certain lines only. See
hdf2df()
- hdf2df(fname, columns=None, isotope=None, load_wavenum_min=None, load_wavenum_max=None, verbose=True, store_kwargs={}, engine='guess', output='pandas')[source]¶
Load a HDF5 line databank into a Pandas DataFrame.
Adds HDF5 metadata in
df.attrs
- Parameters:
fname (str) – HDF5 file name
columns (list of str) – list of columns to load. If
None
, returns all columns in the file.isotope (str) – load only certain isotopes :
'2'
,'1,2'
, etc. IfNone
, loads everything. DefaultNone
.load_wavenum_min, load_wavenum_max (float (cm-1)) – load only specific wavelength.
- Other Parameters:
store_kwargs (dict) – arguments forwarded to
read_hdf()
engine (
'h5py'
,'pytables'
,'vaex'
,'auto'
) – which HDF5 library to use. If'guess'
, try to guess. Note:'vaex'
uses'h5py'
compatible HDF5.output (‘pandas’, ‘vaex’, ‘jax’) – format of the output DataFrame. If
'jax'
, returns a dictionary of jax arrays.
- Returns:
df – dataframe containing all lines or energy levels
- Return type:
pandas Dataframe, or vaex DataFrameLocal, or dictionary of Jax arrays
Examples
path = getDatabankEntries("HITEMP-OH")['path'][0] df = hdf2df(path) df = hdf2df(path, columns=['wav', 'int']) df = hdf2df(path, isotope='2') df = hdf2df(path, isotope='1,2) df = hdf2df(path, load_wavenum_min=2300, load_wavenum_max=2500)
Notes
DataFrame metadata in
df.attrs
is still experimental in Pandas and can be lost duringgroupby, pivot, join or loc
operations on the Dataframe. See https://stackoverflow.com/questions/14688306/adding-meta-information-metadata-to-pandas-dataframeAlways check for existence !
- update_pytables_to_vaex(fname, remove_initial=False, verbose=True, key='df')[source]¶
Convert a HDF5 file generated from PyTables to a Vaex-friendly HDF5 format, preserving metadata
- vaexsafe_colname(name)[source]¶
replace ‘/’ (forbidden in HDF5 vaex column names with ‘_’ https://github.com/radis/radis/issues/473 https://github.com/vaexio/vaex/issues/1255