Using the run_algo
script
Presentation
The easiest way to use the pygeodyn algorithms is through the run_algo
script. This script implements data assimilation algorithms by taking care of chaining and looping steps, and of numerous handy things such as the uses of multiple processes with mpi4py, the saving of output files and of logs.
The script is organised as follows:
Initialisation of the algorithm (reading prior, observations, computing matrices,…)
Initialisation of the CoreState (
init_core_state
: normal or from_file)First analysis at
t_start_analysis
(mandatory parameter)Last analysis at
t_end_analysis
(mandatory parameter)- Loop on forecast times:
Forecast of all realisations
If at an analysis time, observation data is found, analysis of the ensemble of realisations is performed
Set up analysed data for forecast
If
-shear
is 1, the shear at the CMB is computed from the analysis results (only at analysis times)Saving the desired results with the desired format (see Configuration)
Usage
run_algo.py can be run from the console as any Python script
python3 run_algo.py -conf augkf.conf
and accepts the following arguments:
- Computation parameters
-conf
: Path to the configuration file (no default value: must be given).-algo
: the name of the algorithm to use (default:augkf
).-m
: the number of realisations to consider (default: 20).-shear
: 0 don’t compute shear, 1 compute shear (default: 0).-seed
: a number used for the generation of random quantities. Set this to get reproducible results (default: generate a new one at each execution).
- Output files
-path
: path where the folder results will be created (default:~/pygeodyn_results/
).-cname
: name of the computation. Used to create a folder in-path
where the results will be saved (default:Current_computation
).
- Logging
-v
: level of the logging: 1 (DEBUG), 2 (INFO, default), 3 (WARNING), 4 (ERROR), 5 (CRITICAL).-l
: name of the log file that will be created in the result folder (default: display logs in the console).
Configuration file
pygeodyn uses configuration files (extension: .conf
) to set the computation parameters. For a quick overview, see example.conf that is an example of such a configuration file. A complete description of the format is given in the Configuration part.
Algorithm
As for now, there is one possible algorithm: augkf (Augmented state Kalman Filter implemented in Python). Another algorithm zonkf (A variation of AugKF with zonal coefficients treated apart) has been archived. See the presentations of AugKF and ZonKF for more information.
Realisations
The algorithm implemented in pygeodyn is an ensemble Kalman filter: a Kalman filter that works on an ensemble of realisations of the core state. These can be considered as random draws around a mean core state. For more information on the ensemble Kalman filter, see Evensen G. (2009) Data Assimilation: The Ensemble Kalman Filter.
Each realisation is forecast and analysed on its own (allowing parallel computation: see the Example with MPI below). The parameter -m sets the number of realisations (called hereafter nb_reals) to use and therefore has a huge impact on the runtime of the algorithm and on the quality of the results.
For an AR-1 process, 20 realisations is a minimum and a value greater than 100 is advised if higher resolution is needed.
For an AR-3 process, 100 realisations is a minimum and a value greater than 300 is advised if higher resolution is needed; and Principal Component Analysis (PCA) is needed (N_pca_u
value of 100 is advised).
Output files
Output folder
When a new computation is launched, pygeodyn creates a folder of name -cname
(default: Current_computation
) at the location given in -path
(default: ~/pygeodyn_results/
). All result files will be created in this output folder (including the log file if -l
was given).
Result files
HDF5 format
Pygeodyn saves the specified results in the specified format (see Configuration) at each time iteration in an hdf5 file of name -cname
. HDF5 files can be read and manipulated with a variety of tools including the Python package h5py
used in pygeodyn. It is a Hierarchical Data Format (HDF) where the data is organized along a directory-like structure.
The possible groups are:
analysed: Group containing the core states saved after each analysis
forecast: Group containing the core states saved after each forecast
computed: Group containing the core states on which computation steps are performed. Equal to forecast except at analysis times where it is replaced by the analysed data that is used for the next forecast.
misfits : Group containing the misfits saved after each analysis
Each of these groups contains at least:
ER: Gauss coefficients of the subgrid errors (dim: nb_reals x nb_times x Nsv)
MF: Gauss coefficients of the magnetic field (dim: nb_reals x nb_times x Nb)
SV: Gauss coefficients of the secular variation (dim: nb_reals x nb_times x Nsv)
U: Gauss coefficients of the core flow at the CMB (dim: nb_reals x nb_times x Nu2)
times: Array of dates stored with decimal representation (dim: nb_times)
If -shear
is 1 then analysed group also contains S : Gauss coefficients of the shear at the CMB (dim: nb_reals x nb_times x Nu2)
If AR_type
(config parameter) is AR3 then analysed group also contains:
dUdt: Gauss coefficients of the first time derivative of U (dim: nb_reals x nb_times x Nu2)
d2Udt2: Gauss coefficients of the second time derivative of U (dim: nb_reals x nb_times x Nu2)
dERdt: Gauss coefficients of the first time derivative of ER (dim: nb_reals x nb_times x Nsv)
d2ERdt2: Gauss coefficients of the second time derivative of ER (dim: nb_reals x nb_times x Nsv)
The variable nb_times depends on the group. It is equal to:
nb_analyses for analysed and misfits
nb_forecasts for computed and forecast
Examples
- Minimal example:
python3 run_algo.py -conf augkf.conf
This will launch a computation using the parameters stored in augkf.conf and with other arguments set to their default values.
- Example with all arguments given:
python3 run_algo.py -conf augkf.conf -m 5 -algo 'augkf' -seed 15 -path './results/' -cname 'AugKF' -v 4 -l 'log.txt'
This will launch an augkf computation on 5 realisations using the parameters stored in augkf.conf and a seed of 15 to initialise random values. The results will be saved in
./results/AugKF
, as will be the log filelog.txt
at ERROR level.
- Example with MPI
Forecasts can be performed in parallel if using mpi4py. For this, use the
mpiexec
command by setting the number of process. The other arguments can be set as before.mpiexec -np 4 python3 -m mpi4py run_algo.py -conf augkf.conf mpiexec -np 2 python3 -m mpi4py run_algo.py -v 3 -conf augkf.conf -path './results/' -m 3 -cname 'AugKF' -algo 'augkf'
To go further
pygeodyn offers the capabilities to run supplied algorithms on custom priors (from which covariance matrices derive) and/or custom observations. This can be done by creating new prior and observations types by supplying their reading methods.