radis.io.geisa module¶
Summary¶
GEISA database parser
- class GEISADatabaseManager(name, molecule, local_databases, engine='default', verbose=True, chunksize=100000, parallel=True)[source]¶
Bases:
DatabaseManager
- parse_to_local_file(opener, urlname, local_file, pbar_active=True, pbar_t0=0, pbar_Ntot_estimate_factor=None, pbar_Nlines_already=0, pbar_last=True)[source]¶
Uncompress
urlname
intolocal_file
. Also add metadata- Parameters
opener (an opener with an .open() command)
gfile (file handler. Filename: for info)
- columns_GEISA = {'A': ('a10', <class 'float'>, 'Einstein A coefficient', 's-1'), 'El': ('a10', <class 'float'>, 'lower-state energy', 'cm-1'), 'Pshft': ('a9', <class 'float'>, 'air pressure-induced line shift at 296K', 'cm-1.atm-1'), 'Pshfts': ('a8', <class 'float'>, 'self pressure-induced line shift at 296K', 'cm-1.atm-1'), 'Tdpair': ('a4', <class 'float'>, 'temperature-dependance exponent for Gamma air', ''), 'Tdpnself': ('a4', <class 'float'>, 'temperature-dependance exponent for self pressure-induced line shift', ''), 'Tdppair': ('a6', <class 'float'>, 'temperature-dependance exponent for air pressure-induced line shift', ''), 'Tdpself': ('a4', <class 'float'>, 'temperature-dependance exponent for self-broadening halfwidth', ''), 'airbrd': ('a6', <class 'float'>, 'air-broadened half-width at 296K', 'cm-1.atm-1'), 'globl': ('a25', <class 'str'>, 'electronic and vibrational global lower quanta', ''), 'globu': ('a25', <class 'str'>, 'electronic and vibrational global upper quanta', ''), 'id': ('a2', <class 'int'>, 'Hitran molecular number', ''), 'idG': ('a3', <class 'str'>, 'Internal GEISA code for the data identification', ''), 'ierrA': ('a10', <class 'float'>, 'estimated accuracy on the line position', 'cm-1'), 'ierrB': ('a11', <class 'str'>, 'estimated accuracy on the intensity of the line', 'cm-1/(molecule/cm-2)'), 'ierrC': ('a6', <class 'float'>, 'estimated accuracy on the air collision halfwidth', 'cm-1.atm-1'), 'ierrF': ('a4', <class 'float'>, 'estimated accuracy on the temperature dependence coefficient of the air-broadening halfwidth', ''), 'ierrN': ('a7', <class 'float'>, 'estimated accuracy on the self-broadened at 296K', 'cm-1.atm-1'), 'ierrO': ('a9', <class 'float'>, 'estimated accuracy on the air pressure shift of the line transition at 296K', 'cm-1.atm-1'), 'ierrR': ('a6', <class 'float'>, 'estimated accuracy on the temperature dependence coefficient of the air pressure shift', ''), 'ierrS': ('a4', <class 'float'>, 'estimated accuracy on the temperature dependence coefficient of the self-broadening halfwidth', ''), 'ierrT': ('a8', <class 'float'>, 'estimated accuracy on the self-pressure shift of the line transition at 296K', 'cm-1.atm-1'), 'ierrU': ('a4', <class 'float'>, 'estimated accuracy on the temperature dependence coefficient of the self pressure shift', ''), 'int': ('a11', <class 'str'>, 'intensity at 296K', 'cm-1/(molecule/cm-2)'), 'iso': ('a1', <class 'int'>, 'Hitran isotope number', ''), 'isoG': ('a3', <class 'int'>, 'GEISA isotope number', ''), 'locl': ('a15', <class 'str'>, 'electronic and vibrational local lower quanta', ''), 'locu': ('a15', <class 'str'>, 'electronic and vibrational local upper quanta', ''), 'mol': ('a3', <class 'int'>, 'GEISA molecular number', ''), 'selbrd': ('a7', <class 'float'>, 'self-broadened half-width at 296K', 'cm-1.atm-1'), 'wav': ('a12', <class 'float'>, 'vacuum wavenumber', 'cm-1')}[source]¶
parsing order of GEISA2020 format
- Type
OrderedDict
- fetch_geisa(molecule, local_databases=None, databank_name='GEISA-{molecule}', isotope=None, load_wavenum_min=None, load_wavenum_max=None, columns=None, cache=True, verbose=True, chunksize=100000, clean_cache_files=True, return_local_path=False, engine='default', output='pandas', parallel=True)[source]¶
Stream GEISA file from GEISA website. Unzip and build a HDF5 file directly.
Returns a Pandas DataFrame containing all lines.
- Parameters
molecule (all 58 GEISA 2020 molecules. See here https://geisa.aeris-data.fr/interactive-access/?db=2020&info=ftp)
local_databases (str) – where to create the RADIS HDF5 files. Default
"~/.radisdb/geisa"
. Can be changed inradis.config["DEFAULT_DOWNLOAD_PATH"]
or in ~/radis.json config filedatabank_name (str) – name of the databank in RADIS Configuration file Default
"GEISA-{molecule}"
isotope (str, int or None) – load only certain isotopes :
'2'
,'1,2'
, etc. IfNone
, loads everything. DefaultNone
.load_wavenum_min, load_wavenum_max (float (cm-1)) – load only specific wavenumbers.
columns (list of str) – list of columns to load. If
None
, returns all columns in the file.
- Other Parameters
cache (
True
,False
,'regen'
or'force'
) – ifTrue
, use existing HDF5 file. IfFalse
or'regen'
, rebuild it. If'force'
, raise an error if cache file cannot be used (useful for debugging). DefaultTrue
.verbose (bool)
chunksize (int) – number of lines to process at a same time. Higher is usually faster but can create Memory problems and keep the user uninformed of the progress.
clean_cache_files (bool) – if
True
clean downloaded cache files after HDF5 are created.return_local_path (bool) – if
True
, also returns the path of the local database file.engine (‘pytables’, ‘vaex’, ‘default’) – which HDF5 library to use to parse local files. If ‘default’ use the value from ~/radis.json
output (‘pandas’, ‘vaex’, ‘jax’) – format of the output DataFrame. If
'jax'
, returns a dictionary of jax arrays. If'vaex'
, output is avaex.dataframe.DataFrameLocal
Note
Vaex DataFrames are memory-mapped. They do not take any space in RAM and are extremelly useful to deal with the largest databases.
parallel (bool) – if
True
, uses joblib.parallel to load database with multiple processes
- Returns
df (pd.DataFrame) – Line list A HDF5 file is also created in
local_databases
and referenced in the RADIS config file with namedatabank_name
local_path (str) – path of local database file if
return_local_path
Examples
from radis import fetch_geisa df = fetch_geisa("CO") print(df.columns) >>> Index(['wav', 'int', 'airbrd', 'El', 'globu', 'globl', 'locu', 'locl', 'Tdpgair', 'isoG', 'mol', 'idG', 'id', 'iso', 'A', 'selbrd', 'Pshft', 'Tdpair', 'ierrA', 'ierrB', 'ierrC', 'ierrF', 'ierrO', 'ierrR', 'ierrN', 'Tdpgself', 'ierrS', 'Pshfts', 'ierrT', 'Tdppself', 'ierrU'], dtype='object')
Compare CO spectrum from the GEISA and HITRAN database
Compare CO spectrum from the GEISA and HITRAN databaseNotes
if using
load_only_wavenum_above/below
orisotope
, the whole database is anyway downloaded and uncompressed tolocal_databases
fast access .HDF5 files (which will take a long time on first call). Only the expected wavenumber range & isotopes are returned. The .HFD5 parsing useshdf2df()
See also
fetch_hitran()
,fetch_exomol()
,fetch_hitemp()
,hdf2df()
,fetch_databank()
- gei2df(fname, cache=True, load_columns=None, verbose=True, drop_non_numeric=True, load_wavenum_min=None, load_wavenum_max=None, engine='pytables')[source]¶
Convert a GEISA 1 file to a Pandas dataframe. :Parameters: * fname (str) – GEISA file name.
cache (boolean, or ‘regen’) – if
True
, a pandas-readable HDF5 file is generated on first access, and later used. This saves on the datatype cast and conversion and improves performances a lot (but changes in the database are not taken into account). IfFalse
, no database is used. If ‘regen’, temp file are reconstructed. DefaultTrue
.load_columns (list) – columns to load. If
None
, loads everything. .. note:this is only relevant when loading from a cache file. To generate the cache file, all columns are loaded anyway.
- Other Parameters
drop_non_numeric (boolean) – if
True
, non numeric columns are dropped. This improves performances, but make sure all the columns you need are converted to numeric formats before hand. DefaultTrue
. Note that if a cache file is loaded it will be left untouched.load_wavenum_min, load_wavenum_max (float) – if not
'None'
, only load the cached file if it contains data for wavenumbers above/below the specified value. See :py:func`~radis.io.cache_files.load_h5_cache_file`. Default'None'
.engine (‘pytables’, ‘vaex’) – format for Hdf5 cache file,
pytables
by default.
- Returns
df – dataframe containing all lines and parameters.
- Return type
pandas Dataframe
Notes
GEISA Database 2020 release can be downloaded from 2
References