Convert a CDSD-HITEMP [1]_ or CDSD-4000 [2] file to a Pandas dataframe.
Parameters:
fname (str) – CDSD file name
version (str (‘4000’, ‘hitemp’)) – CDSD version
cache (boolean, or ‘regen’) – if True, a pandas-readable HDF5 file is generated on first access,
and later used. This saves on the datatype cast and conversion and
improves performances a lot (but changes in the database are not
taken into account). If False, no database is used. If ‘regen’, temp
file are reconstructed. Default True.
load_columns (list) – columns to load. If None, loads everything
Note
this is only relevant if loading from a cache file. To generate
the cache file, all columns are loaded anyway.
Other Parameters:
drop_non_numeric (boolean) – if True, non numeric columns are dropped. This improves performances,
but make sure all the columns you need are converted to numeric formats
before hand. Default True. Note that if a cache file is loaded it
will be left untouched.
load_wavenum_min, load_wavenum_max (float) – if not 'None', only load the cached file if it contains data for
wavenumbers above/below the specified value. See :py:func`~radis.api.cache_files.load_h5_cache_file`.
Default 'None'.
engine (‘pytables’, ‘vaex’) – format for Hdf5 cache file. Default pytables
Returns:
df – dataframe containing all lines and parameters
Performances: I had huge performance trouble with this function, because the files are
huge (500k lines) and the format is to special (no space between numbers…)
to apply optimized methods such as pandas’s. A line by line reading isn’t
so bad, using struct to parse each line. However, we waste typing determining
what every line is. I ended up using the fromfiles functions from numpy,
not considering n (line return) as a special character anymore, and a second call
to numpy to cast the correct format. That ended up being twice as fast.
Convert a HITRAN/HITEMP [1]_ file to a Pandas dataframe
Parameters:
fname (str) – HITRAN-HITEMP file name
cache (boolean, or 'regen' or 'force') – if True, a pandas-readable HDF5 file is generated on first access,
and later used. This saves on the datatype cast and conversion and
improves performances a lot (but changes in the database are not
taken into account). If False, no database is used. If 'regen', temp
file are reconstructed. Default True.
Other Parameters:
drop_non_numeric (boolean) – if True, non numeric columns are dropped. This improves performances,
but make sure all the columns you need are converted to numeric formats
before hand. Default True. Note that if a cache file is loaded it
will be left untouched.
load_wavenum_min, load_wavenum_max (float) – if not 'None', only load the cached file if it contains data for
wavenumbers above/below the specified value. See :py:func`~radis.api.cache_files.load_h5_cache_file`.
Default 'None'.
engine (‘pytables’, ‘vaex’) – format for Hdf5 cache file. Default pytables
parse_quanta (bool) – if True, parse local & global quanta (required to identify lines
for non-LTE calculations ; but sometimes lines are not labelled.)
output (str) – output format of data as pandas Dataformat or vaex Dataformat
Returns:
df – dataframe containing all lines and parameters