radis.io.geisa module

Summary

GEISA database parser


class GEISADatabaseManager(name, molecule, local_databases, engine='default', verbose=True, chunksize=100000, parallel=True)[source]

Bases: DatabaseManager

fetch_urlnames()[source]

requires connexion

parse_to_local_file(opener, urlname, local_file, pbar_active=True, pbar_t0=0, pbar_Ntot_estimate_factor=None, pbar_Nlines_already=0, pbar_last=True)[source]

Uncompress urlname into local_file. Also add metadata

Parameters
  • opener (an opener with an .open() command)

  • gfile (file handler. Filename: for info)

register()[source]

register in ~/radis.json

columns_GEISA = {'A': ('a10', <class 'float'>, 'Einstein A coefficient', 's-1'), 'El': ('a10', <class 'float'>, 'lower-state energy', 'cm-1'), 'Pshft': ('a9', <class 'float'>, 'air pressure-induced line shift at 296K', 'cm-1.atm-1'), 'Pshfts': ('a8', <class 'float'>, 'self pressure-induced line shift at 296K', 'cm-1.atm-1'), 'Tdpair': ('a4', <class 'float'>, 'temperature-dependance exponent for Gamma air', ''), 'Tdpnself': ('a4', <class 'float'>, 'temperature-dependance exponent for self pressure-induced line shift', ''), 'Tdppair': ('a6', <class 'float'>, 'temperature-dependance exponent for air pressure-induced line shift', ''), 'Tdpself': ('a4', <class 'float'>, 'temperature-dependance exponent for self-broadening halfwidth', ''), 'airbrd': ('a6', <class 'float'>, 'air-broadened half-width at 296K', 'cm-1.atm-1'), 'globl': ('a25', <class 'str'>, 'electronic and vibrational global lower quanta', ''), 'globu': ('a25', <class 'str'>, 'electronic and vibrational global upper quanta', ''), 'id': ('a2', <class 'int'>, 'Hitran molecular number', ''), 'idG': ('a3', <class 'str'>, 'Internal GEISA code for the data identification', ''), 'ierrA': ('a10', <class 'float'>, 'estimated accuracy on the line position', 'cm-1'), 'ierrB': ('a11', <class 'str'>, 'estimated accuracy on the intensity of the line', 'cm-1/(molecule/cm-2)'), 'ierrC': ('a6', <class 'float'>, 'estimated accuracy on the air collision halfwidth', 'cm-1.atm-1'), 'ierrF': ('a4', <class 'float'>, 'estimated accuracy on the temperature dependence coefficient of the air-broadening halfwidth', ''), 'ierrN': ('a7', <class 'float'>, 'estimated accuracy on the self-broadened at 296K', 'cm-1.atm-1'), 'ierrO': ('a9', <class 'float'>, 'estimated accuracy on the air pressure shift of the line transition at 296K', 'cm-1.atm-1'), 'ierrR': ('a6', <class 'float'>, 'estimated accuracy on the temperature dependence coefficient of the air pressure shift', ''), 'ierrS': ('a4', <class 'float'>, 'estimated accuracy on the temperature dependence coefficient of the self-broadening halfwidth', ''), 'ierrT': ('a8', <class 'float'>, 'estimated accuracy on the self-pressure shift of the line transition at 296K', 'cm-1.atm-1'), 'ierrU': ('a4', <class 'float'>, 'estimated accuracy on the temperature dependence coefficient of the self pressure shift', ''), 'int': ('a11', <class 'str'>, 'intensity at 296K', 'cm-1/(molecule/cm-2)'), 'iso': ('a1', <class 'int'>, 'Hitran isotope number', ''), 'isoG': ('a3', <class 'int'>, 'GEISA isotope number', ''), 'locl': ('a15', <class 'str'>, 'electronic and vibrational local lower quanta', ''), 'locu': ('a15', <class 'str'>, 'electronic and vibrational local upper quanta', ''), 'mol': ('a3', <class 'int'>, 'GEISA molecular number', ''), 'selbrd': ('a7', <class 'float'>, 'self-broadened half-width at 296K', 'cm-1.atm-1'), 'wav': ('a12', <class 'float'>, 'vacuum wavenumber', 'cm-1')}[source]

parsing order of GEISA2020 format

Type

OrderedDict

fetch_geisa(molecule, local_databases=None, databank_name='GEISA-{molecule}', isotope=None, load_wavenum_min=None, load_wavenum_max=None, columns=None, cache=True, verbose=True, chunksize=100000, clean_cache_files=True, return_local_path=False, engine='default', output='pandas', parallel=True)[source]

Stream GEISA file from GEISA website. Unzip and build a HDF5 file directly.

Returns a Pandas DataFrame containing all lines.

Parameters
  • molecule (all 58 GEISA 2020 molecules. See here https://geisa.aeris-data.fr/interactive-access/?db=2020&info=ftp)

  • local_databases (str) – where to create the RADIS HDF5 files. Default "~/.radisdb/geisa". Can be changed in radis.config["DEFAULT_DOWNLOAD_PATH"] or in ~/radis.json config file

  • databank_name (str) – name of the databank in RADIS Configuration file Default "GEISA-{molecule}"

  • isotope (str, int or None) – load only certain isotopes : '2', '1,2', etc. If None, loads everything. Default None.

  • load_wavenum_min, load_wavenum_max (float (cm-1)) – load only specific wavenumbers.

  • columns (list of str) – list of columns to load. If None, returns all columns in the file.

Other Parameters
  • cache (True, False, 'regen' or 'force') – if True, use existing HDF5 file. If False or 'regen', rebuild it. If 'force', raise an error if cache file cannot be used (useful for debugging). Default True.

  • verbose (bool)

  • chunksize (int) – number of lines to process at a same time. Higher is usually faster but can create Memory problems and keep the user uninformed of the progress.

  • clean_cache_files (bool) – if True clean downloaded cache files after HDF5 are created.

  • return_local_path (bool) – if True, also returns the path of the local database file.

  • engine (‘pytables’, ‘vaex’, ‘default’) – which HDF5 library to use to parse local files. If ‘default’ use the value from ~/radis.json

  • output (‘pandas’, ‘vaex’, ‘jax’) – format of the output DataFrame. If 'jax', returns a dictionary of jax arrays. If 'vaex', output is a vaex.dataframe.DataFrameLocal

    Note

    Vaex DataFrames are memory-mapped. They do not take any space in RAM and are extremelly useful to deal with the largest databases.

  • parallel (bool) – if True, uses joblib.parallel to load database with multiple processes

Returns

  • df (pd.DataFrame) – Line list A HDF5 file is also created in local_databases and referenced in the RADIS config file with name databank_name

  • local_path (str) – path of local database file if return_local_path

Examples

from radis import fetch_geisa
df = fetch_geisa("CO")
print(df.columns)
>>> Index(['wav', 'int', 'airbrd', 'El', 'globu', 'globl', 'locu', 'locl',
    'Tdpgair', 'isoG', 'mol', 'idG', 'id', 'iso', 'A', 'selbrd', 'Pshft',
    'Tdpair', 'ierrA', 'ierrB', 'ierrC', 'ierrF', 'ierrO', 'ierrR', 'ierrN',
    'Tdpgself', 'ierrS', 'Pshfts', 'ierrT', 'Tdppself', 'ierrU'],
    dtype='object')
Compare CO spectrum from the GEISA and HITRAN database

Compare CO spectrum from the GEISA and HITRAN database

Compare CO spectrum from the GEISA and HITRAN database

Notes

if using load_only_wavenum_above/below or isotope, the whole database is anyway downloaded and uncompressed to local_databases fast access .HDF5 files (which will take a long time on first call). Only the expected wavenumber range & isotopes are returned. The .HFD5 parsing uses hdf2df()

gei2df(fname, cache=True, load_columns=None, verbose=True, drop_non_numeric=True, load_wavenum_min=None, load_wavenum_max=None, engine='pytables')[source]

Convert a GEISA 1 file to a Pandas dataframe. :Parameters: * fname (str) – GEISA file name.

  • cache (boolean, or ‘regen’) – if True, a pandas-readable HDF5 file is generated on first access, and later used. This saves on the datatype cast and conversion and improves performances a lot (but changes in the database are not taken into account). If False, no database is used. If ‘regen’, temp file are reconstructed. Default True.

  • load_columns (list) – columns to load. If None, loads everything. .. note:

    this is only relevant when loading from a cache file. To generate
    the cache file, all columns are loaded anyway.
    
Other Parameters
  • drop_non_numeric (boolean) – if True, non numeric columns are dropped. This improves performances, but make sure all the columns you need are converted to numeric formats before hand. Default True. Note that if a cache file is loaded it will be left untouched.

  • load_wavenum_min, load_wavenum_max (float) – if not 'None', only load the cached file if it contains data for wavenumbers above/below the specified value. See :py:func`~radis.io.cache_files.load_h5_cache_file`. Default 'None'.

  • engine (‘pytables’, ‘vaex’) – format for Hdf5 cache file, pytables by default.

Returns

df – dataframe containing all lines and parameters.

Return type

pandas Dataframe

Notes

GEISA Database 2020 release can be downloaded from 2

References

1

The 2020 edition of the GEISA spectroscopic database, Thibault Delahaye et al., 2021

2

GEISA Database 2020 release

See also

hit2df(), cdsd2df()

get_last(b)[source]

Get non-empty lines of a chunk b, parsing the bytes.