Py4CATS --- FAQ

Forecasted Asked Questions

(Answers specific to the (I)Python shell / Jupyter notebook or to the Unix/Linux command line shell (terminal/console) are given in blue and green, respectively.)


Reading a pickle file of radiances with riRead fails, (I)Python response ModuleNotFoundError: No module named 'py4CAtS'?

Quick solution/recipe: In the pickle-radiance file(s) exchange all occurances of py4CAtS by py4cats (i.e. change upper case to lower case). (Although the pickle file is not a simple ascii text file, you can do this with an editor (at least with vi: :%s/py4CAtS/py4cats/g)). If you have several pickle-radiance files, you can automate this upper to lower case change using a sed script inside a loop.

Background explanation: Probably the radiance data pickled in this file have been generated (and stored) with an "old, flat" version of Py4CAtS, where all sources (Python modules) were collected in a single directory called src within a directory py4CAtS (i.e. with a mix of lower and upper case letters). To avoid any confusion about lower/upper case letters, the top level directory of the "new" package has been renamed to py4cats (all letters lower case, so you can import all functions with from py4cats import *).
More background: The problem is due to the fact that in the pickle file some of the attributes of the radiance array (riArray) are saved using py4cats' internal data structures, i.e. the x attribute defining the wavenumber interval of the radiance spectrum is saved as an instance of the Interval class defined in the py4cats.var.pairTypes module.

Why not unify the two script higstract and lbl2od?

Typically extracting lines form Hitran or Geisa in a certain spectral range will provide linelists of a dozen molecules or so. Some of these molecules are probably very ``exotic'', i.e. they might have very small concentrations and hence will "normally" not contribute significantly to the total optical depth. However, especially heavy molecules such as ClONO2 will have thousands or even ten thousands of lines and would slow down the line-by-line calculation considerably.

Therefore its up to the user to decide what molecules should be taken into account for cross section, absorption coefficient, and optical depth: First extract (higstract) the line parameters from Hitran or Geisa and check the (list of) line parameter (files) (you can do this graphically with plot_atlas *.vSEan in the Unix shell or the atlas function in the (I)Python shell); then continue to optical depths with only the 'important' molecules.

If you already know YOUR molecules, then you can simply invoke higstract individually for each molecule (specified with the molecule='XYZ' (I)Python shell) or -m XYZ (Unix shell) option).


Line wing cuts?

Currently the lbl functions of Py4CAtS do not cut the lines at a certain distance from the line center, i.e. the line profile (default Voigt) of all lines contributing to a certain wavenumber interval (see next item) is computed in this entire interval. (This is not an issue for small intervals, but clearly a problem for large intervals. Note, however, that there is no universally accepted definition / convention for the "correct" wing distance. For example, the (MT-)CKD continuum corrections for water H2O (not yet supported by Py4CAtS) are constructed assuming a cut at Δν = ±25 cm-1.)

Selection of spectral range, contributions from line outside the interval

In order to compute cross sections, absorption coefficients, and optical depths for some spectral range νlo ... νhi, all lines in an extended spectral range νlo - Δ ... νhi + Δ should be considered, where Δ is typically some wavenumbers cm-1 (the actual size of Δ depends on the number, density, and strengths of lines outside 'your' interval and on further factors such as pressure). As the higstract and lbl2od (or lbl2xs) functions and corresponding scripts are completely independent, this extension is not done automatically.
(Note that the line wing cut Δν discussed in the previous item and the interval extension Δ discussed here are not strictly related, although it is reasonable to use the same value.)

Example: first collect line data in the 9.6μm spectral range in a dictionary of line lists: then compute cross sections:
(I)Python shell: lineListDict = higstract('/data/hitran/2016/lines', Interval(950,1050)+10) and then lbl2xs(lineListDict, ...)
Unix shell example: higstract -x 950,1050 -w10 .... and lbl2xs - x 950,1050 *.vSEan


What is the difference between the -w and -W options?

In some way the -w (see also previous item) is kind of 'physics option', whereas the -W is more related to the numerics and approximation, and it is only relevant for the 'multigrid' cross section algorithms: Near the line center the cross section for an individual line is evaluated on a fine wavenumber grid with a spacing essentially determined by the line width (γ = hwhm = half width @ half maximum): δν = γ / n where n is the sampling rate defined by the sampling option -s (default 5.0). Outside this line center region the (Voigt) line profile is evaluated only on a coarse grid with spacing increased by a factor two, four, or eight (according to the gridRatio or -g option). The extension of the center region is defined by the nWidths or -W option, typically ± 25γ.

How long does it typically take to extract the line parameters from HITRAN/GEISA?

Due to a kind of bisection approach higstract should find the first line to be accepted fairly quickly within a second (even in the ultraviolet, i.e. at the "far end" of the database). higstract will report on the first and last line found/accepted. On a modern computer reading the lines should not take more than a few seconds, and you should get a note on the completed extraction quite soon. Depending on the number of lines actually extracted, saving/writing to file/disk may take some time.
Note: If the data (input and/or output) are located on a different machine, i.e., have to be transferred over the network, things can become considerably slower!

Can I mix line parameter data from HITRAN and GEISA?

higstract collects lines for each molecule separately in individual numpy arrays Therefore you can easily mix data from different databases, e.g., spectroscopic data for some moleules from HITRAN, some other molecules from GEISA. Apparently you can also use different versions of these data bases. (NOTE: Do not collect data of the same molecule twice!)
(When the function higstract is called without the molecule option or with the molecule='main' option, a dictionary with line lists is returned with keys corresponding to the molecules' names. Then, two (or more) dictionaries resulting from, e.g., HITRAN and GEISA data can be "combined" with standard Python dictionary methods.)
(The Unix script higstract.py saves these arrays in individual files with extensions like .vSEan and the file base name given by the molecule name. The lbl2xs.py and related scripts do not care about the 'origin' of these .vSEan files.)

What about continuum, e.g., (MT-)CKD for water?

Continuum corrections to the lbl cross sections and absorptions coefficients are currently not implemented. (Collision induced absorption (CIA) is currently under development.)

Some of the cross section values are negative?

Some negative cross section values can show up due to problems of higher order Lagrange interpolation. Typically this happens for small y at the transition of the Gaussian line center to the Lorentzian line wings. In these cases use (default) linear interpolation:
(I)Python shell: lbl2xs(lineData, ..., lagrange=2) or
Unix shell: lbl2od -i2 .... or lbl2xs -i2 .....

The optical depth file is huge – Do we really need this fine resolution?

Depending on the size of the spectral interval, the spectral region, and the altitude (pressure) of the highest atmospheric levels (top-of-atmosphere), the resulting optical depth file can become really huge, i.e., a wavenumber grid of a million points is not unusual. So is it possible to use less data points, i.e., a coarser wavenumber grid?

First, the wavenumber grid is equidistant with a spacing (xi - xi-1) automatically adjusted to the (mean) half width (hwhm γ) of all contributing lines. As pressure is decreasing with altitude (approximately p ~ exp(-z)) and pressure broadening half width is proportional to pressure, lines are becoming narrower and narrower (until Doppler broadening starts to dominate), and thus the number of grid points is increasing with altitude. The grid point spacing (and hence the number of grid points) can be influenced by means of the sampling option used by lbl2xs, lbl2ac, lbl2od: the default value of 5.0 results in a spacing dν=γ/5 (approximately).
(I)Python shell: lbl2xs(lineData, ..., sampling=5) or
Unix shell: lbl2xs -s5 ....
(and analogously for lbl2ac and lbl2od).

So it is tempting to reduce the sampling rate. However, then it may happen that the peak(s) of some important strong line(s) are just between some grid points and hence "lost". If you really want to reduce the number of grid points, it is better to use a sampling with some grid points per hwhm and average the final optical depth spectra afterwards. (Note that from a physics point of view it is the radiance and transmission to be convolved with an instrument response function.)


© fgs   (+49-8153-28-1234)
Tue 12 Jan 2021; 10:25