Module to host the databank loading / database initialisation parts of
SpectrumFactory. This is done through SpectrumFactory
inheritance of the DatabankLoader class defined here
RADIS includes automatic rebuilding of Deprecated cache files + a global variable
to force regenerating them after a given version. See "OLDEST_COMPATIBLE_VERSION"
key in radis.config
——————————————————————————-
A class to hold Spectrum calculation input conditions
(Input), computation parameters
(Parameters), or miscellaneous parameters
(MiscParams).
Works like a dict except you can also access attribute with:
v=a.key# equivalent to v = a[key]
Also can be copied, deepcopied, and parallelized in multiprocessing
Notes
for developers:
Parameters and Input could also have simply derived from the (object) class,
but it may have missed some convenient functions implemented for dict.
For instance, how to be picked / unpickled.
Returns the variables (and their values) contained in the
dictionary, minus some based on their type. Numpy array, dictionaries
and pandas DataFrame are removed. None is removed in general, except
for some keys (‘cutoff’, ‘truncation’)
Tuples are converted to string
initial line database after loading.
If for any reason, you want to manipulate the line database manually (for instance, keeping only lines emitting
by a particular level), you need to access the df0 attribute of
SpectrumFactory.
Warning
never overwrite the df0 attribute, else some metadata may be lost in the process.
Only use inplace operations. If reducing the number of lines, add
a df0.reset_index()
For instance:
fromradisimportSpectrumFactorysf=SpectrumFactory(wavenum_min=2150.4,wavenum_max=2151.4,pressure=1,isotope=1)sf.load_databank('HITRAN-CO-TEST')sf.df0.drop(sf.df0[sf.df0.vu!=1].index,inplace=True)# keep lines emitted by v'=1 onlysf.eq_spectrum(Tgas=3000,name='vu=1').plot()
df0 contains the lines as they are loaded from the database.
df1 is generated during the spectrum calculation, after the
line database reduction steps, population calculation, and scaling of intensity and broadening parameters
with the calculated conditions.
Fetch the latest files from [HITRAN-2020], [HITEMP-2010] (or newer),
[ExoMol-2020] or [GEISA-2020] , and store them locally in memory-mapping
formats for extremely fast access.
Parameters:
source ('hitran', 'hitemp', 'exomol', 'geisa') – which database to use.
database ('full', 'range', name of an ExoMol database, or 'default') – if fetching from HITRAN, 'full' download the full database and register
it, 'range' download only the lines in the range of the molecule.
Note
'range' will be faster, but will require a new download each time
you’ll change the range. 'full' is slower the 1st time and takes
more on-disk memory, but will be faster over time.
Default is 'full'.
If fetching from HITEMP, only 'full' is available.
if fetching from ‘’exomol’’, use this parameter to choose which database
to use. Keep 'default' to use the recommended one. See all available databases
with radis.io.exomol.get_exomol_database_list()
By default, databases are download in ~/.radisdb.
Can be changed in radis.config["DEFAULT_DOWNLOAD_PATH"] or in
~/radis.json config file
Other Parameters:
parfuncfmt ('cdsd', 'hapi', 'exomol', or any of KNOWN_PARFUNCFORMAT) – format to read tabulated partition function file. If 'hapi', then
[HAPI] (HITRAN Python interface) is used to retrieve [TIPS-2020]
tabulated partition functions.
If 'exomol' then partition functions are downloaded from ExoMol.
Default 'hapi'.
parfunc (filename or None) – path to a tabulated partition function file to use.
levels (dict of str or None) –
path to energy levels (needed for non-eq calculations). Format:
{1:path_to_levels_iso_1,3:path_to_levels_iso3}.
Default None
levelsfmt ('cdsd-pc', 'radis' (or any of KNOWN_LVLFORMAT) or None) – how to read the previous file. Known formats: (see KNOWN_LVLFORMAT).
If radis, energies are calculated using the diatomic constants in radis.db database
if available for given molecule. Look up references there.
If None, non equilibrium calculations are not possible. Default 'radis'.
load_energies (boolean) – if False, dont load energy levels. This means that nonequilibrium
spectra cannot be calculated, but it saves some memory. Default False
include_neighbouring_lines (bool) – if True, includes off-range, neighbouring lines that contribute
because of lineshape broadening. The neighbour_lines
parameter is used to determine the limit. Default True.
parse_local_global_quanta (bool, or 'auto') – if True, parses the HITRAN/HITEMP ‘glob’ and ‘loc’ columns to extract
quanta identifying the lines. Required for nonequilibrium calculations,
or to use line_survey(),
but takes up more space.
drop_non_numeric (boolean) – if True, non numeric columns are dropped. This improves performances,
but make sure all the columns you need are converted to numeric formats
before hand. Default True. Note that if a cache file is loaded it
will be left untouched.
db_use_cached (bool, or 'regen') – use cached
memory_mapping_engine ('pytables', 'vaex', 'feather') – which library to use to read HDF5 files (they are incompatible: 'pytables' is
row-major while 'vaex' is column-major) or other memory-mapping formats
If 'default', use the value from ~/radis.json ["MEMORY_MAPPING_ENGINE"]
parallel (bool) – if True, uses joblib.parallel to load database with multiple processes
(works only for HITEMP files)
load_columns (list, 'all', 'equilibrium', 'noneq', diluent,) – columns names to load.
If 'equilibrium', only load the columns required for equilibrium
calculations. If 'noneq', also load the columns required for
non-LTE calculations. See drop_all_but_these.
If 'all', load everything. Note that for performances, it is
better to load only certain columns rather than loading them all
and dropping them with drop_columns.
If diluent then all additional columns required for calculating spectrum
in that diluent is loaded.
Default 'equilibrium'.
Warning
if using 'equilibrium', not all parameters will be available
for a Spectrum line_survey().
If you are calculating equilibrium (LTE) spectra, it is recommended to
use 'equilibrium'. If you are calculating non-LTE spectra, it is
recommended to use 'noneq'.
Notes
HITRAN is fetched with Astroquery [1]_ or [HAPI], and HITEMP with
fetch_hitemp()
HITEMP files are generated in a ~/.radisdb database.
Get all parameters defined in the SpectrumFactory.
Other Parameters:
ignore_misc (boolean) – if True, then all attributes considered as Factory ‘descriptive’
parameters, as defined in get_conditions() are ignored when
comparing the database to current factory conditions. It should
obviously only be attributes that have no impact on the Spectrum
produced by the factory. Default False
Method to init databank parameters but only load them when needed.
Databank is reloaded by _check_line_databank()
Same inputs Parameters as load_databank():
Parameters:
name (a section name specified in your ~/radis.json) – .radis has to be created in your HOME (Unix) / User (Windows). If
not None, all other arguments are discarded.
Note that all files in database will be loaded and it may takes some
time. Better limit the database size if you already know what
range you need. See Configuration file and
DBFORMAT for expected
~/radis.json format
Other Parameters:
path (str, list of str, None) – list of database files, or name of a predefined database in the
Configuration file (json)
Accepts wildcards * to select multiple files
format ('hitran', 'cdsd-hitemp', 'cdsd-4000', or any of KNOWN_DBFORMAT) – database type. 'hitran' for HITRAN/HITEMP, 'cdsd-hitemp'
and 'cdsd-4000' for the different CDSD versions. Default 'hitran'
parfuncfmt ('hapi', 'cdsd', or any of KNOWN_PARFUNCFORMAT) – format to read tabulated partition function file. If hapi, then
HAPI (HITRAN Python interface) [1]_ is used to retrieve them (valid if
your database is HITRAN data). HAPI is embedded into RADIS. Check the
version. If partfuncfmt is None then hapi is used. Default hapi.
parfunc (filename or None) – path to tabulated partition function to use.
If parfuncfmt is hapi then parfunc should be the link to the
hapi.py file. If not given, then the hapi.py embedded in RADIS is used (check version)
levels (dict of str or None) – path to energy levels (needed for non-eq calculations). Format:
{1:path_to_levels_iso_1, 3:path_to_levels_iso3}. Default None
levelsfmt (‘cdsd-pc’, ‘radis’ (or any of KNOWN_LVLFORMAT) or None) – how to read the previous file. Known formats: (see KNOWN_LVLFORMAT).
If radis, energies are calculated using the diatomic constants in radis.db database
if available for given molecule. Look up references there.
If None, non equilibrium calculations are not possible. Default 'radis'.
db_use_cached (boolean, or None) – if True, a pandas-readable csv file is generated on first access,
and later used. This saves on the datatype cast and conversion and
improves performances a lot. But! … be sure to delete these files
to regenerate them if you happen to change the database. If 'regen',
existing cached files are removed and regenerated.
It is also used to load energy levels from .h5 cache file if exist.
If None, the value given on Factory creation is used. Default None
load_energies (boolean) – if False, dont load energy levels. This means that nonequilibrium
spectra cannot be calculated, but it saves some memory. Default True
include_neighbouring_lines (bool) – True, includes off-range, neighbouring lines that contribute
because of lineshape broadening. The neighbour_lines
parameter is used to determine the limit. Default True.
drop_columns (list) – columns names to drop from Line DataFrame after loading the file.
Not recommended to use, unless you explicitly want to drop information
(for instance if dealing with too large databases). If [], nothing
is dropped. If 'auto', parameters considered unnecessary
are dropped. See drop_auto_columns_for_dbformat
and drop_auto_columns_for_levelsfmt.
Default 'auto'.
load_columns (list, 'all', 'equilibrium', 'noneq') – columns names to load.
If 'equilibrium', only load the columns required for equilibrium
calculations. If 'noneq', also load the columns required for
non-LTE calculations. See drop_all_but_these.
If 'all', load everything. Note that for performances, it is
better to load only certain columns rather than loading them all
and dropping them with drop_columns.
Default 'equilibrium'.
Warning
if using 'equilibrium', not all parameters will be available
for a Spectrum line_survey().
*Other arguments are related to how to open the files*
Notes
Useful in conjunction with init_database()
when dealing with large line databanks when some of the spectra may have
been precomputed in a spectrum database (SpecDatabase)
Note that any previously loaded databank is discarded on the method call
Init a SpecDatabase folder in
path to later store our spectra. Spectra can also be automatically
retrieved from the database instead of being calculated.
Parameters:
path (str) – path to database folder. If it doesnt exist, create it
Accepts wildcards * to select multiple files
autoretrieve (boolean, or 'force') – if True, a database lookup is performed whenever a new spectrum
is calculated. If the spectrum already exists then it is retrieved
from the database instead of being calculated. Spectra are considered
the same if all the stored conditions fit. If set to 'force', an error
is raised if the spectrum is not found in the database (use it for
debugging). Default True
autoupdate (boolean) – if True, all spectra calculated by this Factory are automatically
exported in database. Default True (but only if init_database is
explicitly called by user)
add_info (list, or None/False) – append these parameters and their values if they are in conditions.
Default ['Tvib','Trot']
add_date (str, or None/False) – adds date in strftime format to the beginning of the filename.
Default ‘%Y%m%d’
compress (boolean, or 2) – if True, Spectrum are read and written in binary format. This is faster,
and takes less memory space. Default True.
If 2, additionally remove all redundant quantities.
Other Parameters:
**kwargs (**dict) – arguments sent to SpecDatabase initialization.
Returns:
db – the database where spectra will be stored or retrieved
Loads databank from shortname in the Configuration file. (json), or by manually setting all
attributes.
Databank includes:
- lines
- partition function & format (tabulated or calculated)
- (optional) energy levels, format
Parameters:
name (a section name specified in your ~/radis.json) – .radis has to be created in your HOME (Unix) / User (Windows). If
not None, all other arguments are discarded.
Note that all files in database will be loaded and it may takes some
time. Better limit the database size if you already know what
range you need. See Configuration file and
DBFORMAT for expected
~/radis.json format
Other Parameters:
path (str, list of str, None) – list of database files, or name of a predefined database in the
Configuration file (json)
Accepts wildcards * to select multiple files
format ('hitran', 'cdsd-hitemp', 'cdsd-4000', or any of KNOWN_DBFORMAT) – database type. 'hitran' for HITRAN/HITEMP, 'cdsd-hitemp'
and 'cdsd-4000' for the different CDSD versions. Default 'hitran'
parfuncfmt ('hapi', 'cdsd', or any of KNOWN_PARFUNCFORMAT) – format to read tabulated partition function file. If hapi, then
HAPI (HITRAN Python interface) [1]_ is used to retrieve them (valid if
your database is HITRAN data). HAPI is embedded into RADIS. Check the
version. If partfuncfmt is None then hapi is used. Default hapi.
parfunc (filename or None) – path to tabulated partition function to use.
If parfuncfmt is hapi then parfunc should be the link to the
hapi.py file. If not given, then the hapi.py embedded in RADIS is used (check version)
levels (dict of str or None) – path to energy levels (needed for non-eq calculations). Format:
{1:path_to_levels_iso_1, 3:path_to_levels_iso3}. Default None
levelsfmt (‘cdsd-pc’, ‘radis’ (or any of KNOWN_LVLFORMAT) or None) – how to read the previous file. Known formats: (see KNOWN_LVLFORMAT).
If radis, energies are calculated using the diatomic constants in radis.db database
if available for given molecule. Look up references there.
If None, non equilibrium calculations are not possible. Default 'radis'.
db_use_cached (boolean, or None) – if True, a pandas-readable csv file is generated on first access,
and later used. This saves on the datatype cast and conversion and
improves performances a lot. But! … be sure to delete these files
to regenerate them if you happen to change the database. If 'regen',
existing cached files are removed and regenerated.
It is also used to load energy levels from .h5 cache file if exist.
If None, the value given on Factory creation is used. Default True
load_energies (boolean) – if False, dont load energy levels. This means that nonequilibrium
spectra cannot be calculated, but it saves some memory. Default True
include_neighbouring_lines (bool) – True, includes off-range, neighbouring lines that contribute
because of lineshape broadening. The neighbour_lines
parameter is used to determine the limit. Default True.
*Other arguments are related to how to open the files (***)
drop_columns (list) – columns names to drop from Line DataFrame after loading the file.
Not recommended to use, unless you explicitly want to drop information
(for instance if dealing with too large databases). If [], nothing
is dropped. If 'auto', parameters considered useless
are dropped. See drop_auto_columns_for_dbformat
and drop_auto_columns_for_levelsfmt.
If 'all', parameters considered unnecessary for equilibrium calculations
are dropped, including all information about lines that could be otherwise
available in Spectrum() method.
Warning: nonequilibrium calculations are not possible in this mode.
Default 'auto'.
load_columns (list, 'all', 'equilibrium', 'noneq') – columns names to load.
If 'equilibrium', only load the columns required for equilibrium
calculations. If 'noneq', also load the columns required for
non-LTE calculations. See drop_all_but_these.
If 'all', load everything. Note that for performances, it is
better to load only certain columns rather than loading them all
and dropping them with drop_columns.
Default 'equilibrium'.
Warning
if using 'equilibrium', not all parameters will be available
for a Spectrum line_survey().
isotope (int, or list) – isotope number, sorted in terrestrial abundance
abundance (float, or list)
Examples
fromradisimportSpectrumFactorysf=SpectrumFactory(2284.2,2284.6,wstep=0.001,# cm-1pressure=20*1e-3,# barmole_fraction=400e-6,molecule="CO2",isotope="1,2",verbose=False)sf.load_databank("HITEMP-CO2-TEST")print("Abundance of CO2[1,2]",sf.get_abundance("CO2",[1,2]))sf.eq_spectrum(2000).plot("abscoeff")#%% Set the abundance of CO2(626) to 0.8; and the abundance of CO2(636) to 0.2 (arbitrary):sf.set_abundance("CO2",[1,2],[0.8,0.2])print("New abundance of CO2[1,2]",sf.get_abundance("CO2",[1,2]))sf.eq_spectrum(2000).plot("abscoeff",nfig="same")
Trigger a warning, an error or just ignore based on the value
defined in the warnings
dictionary.
The warnings can thus be deactivated selectively by setting the SpectrumFactory
category (str) – one of the keys of self.warnings. See warnings
level (int) – warning level. Only print warnings when verbose level is higher
than the warning levels. i.e., warnings of level 1 appear only
if verbose==True, warnings of level 2 appear only
for verbose>=2, etc.. Warnings of level 0 appear only the time.
Default 0
Examples
::
if not ((df.Erotu > tol).all() and (df.Erotl > tol).all()):
self.warn(
“There are negative rotational energies in the database”,
“NegativeEnergiesWarning”,
)
Notes
All warnings in the SpectrumFactory should call to this method rather
than the default warnings.warn() method, because it allows easier runtime
modification of how to deal with warnings
Holds Spectrum calculation input conditions, under the attribute
input of
SpectrumFactory.
Works like a dict except you can also access attribute with:
'cdsd-hamil': energies read from precomputed CDSD energies for CO2, with
viblvl=(p,c,J,N) convention, i.e., a each rovibrational level can have a
unique vibrational energy (this is needed when taking account Coupling terms)
See PartFuncCO2_CDSDcalc
None: means you can only do Equilibrium calculations.
Known formats for partition function (tabulated files to read), or ‘hapi’
to fetch Partition Functions using HITRAN Python interface instead of reading
a tabulated file.
A class to hold Spectrum calculation descriptive parameters, under the attribute
params of
SpectrumFactory.
Unlike Parameters, these parameters cannot influence the
Spectrum output and will not be used when comparing Spectrum with existing,
precomputed spectra in SpecDatabase
Works like
a dict except you can also access attribute with:
Holds Spectrum calculation computation parameters, under the attribute
params of
SpectrumFactory.
Works like
a dict except you can also access attribute with:
v=sf.params.key# equivalent to v = sf.params[key]
Also can be copied, deepcopied, and parallelized in multiprocessing
metadata of line DataFrames df0,
df1.
@dev: when having only 1 molecule, 1 isotope, these parameters are
constant for all rovibrational lines. Thus, it’s faster and much more
memory efficient to transport them as attributes of the DataFrame
rather than columns. The syntax is the same, thus the operations do
not change, i.e:
k_b/df.molar_mass
will work whether molar_mass is a float or a column.
Warning
However, in the current Pandas implementation of DataFrame,
attributes are lost whenever the DataFrame is recreated, transposed,
pickled.
Thus, we use transfer_metadata() to keep
the attributes after an operation, and expand_metadata()
to make them columns before a Serializing operation (ex: multiprocessing)
@dev: all of that is a high-end optimization. Users should not deal
with internal DataFrames.
drop all columns but these if using drop_columns='all' in load_databank
Note: nonequilibrium calculations wont be possible anymore and it wont be possible
to identify lines with line_survey()