Using the run_algo script

Presentation

The easiest way to use the pygeodyn algorithms is through the run_algo script. This script implements data assimilation algorithms by taking care of chaining and looping steps, and of numerous handy things such as the uses of multiple processes with mpi4py, the saving of output files and of logs.

The script is organised as follows:

  • Initialisation of the algorithm (reading prior, observations, computing matrices,…)

  • Initialisation of the CoreState (init_core_state: normal or from_file)

  • First analysis at t_start_analysis (mandatory parameter)

  • Last analysis at t_end_analysis (mandatory parameter)

  • Loop on forecast times:
    • Forecast of all realisations

    • If at an analysis time, observation data is found, analysis of the ensemble of realisations is performed

    • Set up analysed data for forecast

  • If -shear is 1, the shear at the CMB is computed from the analysis results (only at analysis times)

  • Saving the desired results with the desired format (see Configuration)

Usage

run_algo.py can be run from the console as any Python script

python3 run_algo.py -conf augkf.conf

and accepts the following arguments:

  • Computation parameters
    • -conf: Path to the configuration file (no default value: must be given).

    • -algo: the name of the algorithm to use (default: augkf).

    • -m: the number of realisations to consider (default: 20).

    • -shear: 0 don’t compute shear, 1 compute shear (default: 0).

    • -seed: a number used for the generation of random quantities. Set this to get reproducible results (default: generate a new one at each execution).

  • Output files
    • -path: path where the folder results will be created (default: ~/pygeodyn_results/).

    • -cname: name of the computation. Used to create a folder in -path where the results will be saved (default: Current_computation).

  • Logging
    • -v: level of the logging: 1 (DEBUG), 2 (INFO, default), 3 (WARNING), 4 (ERROR), 5 (CRITICAL).

    • -l: name of the log file that will be created in the result folder (default: display logs in the console).

Configuration file

pygeodyn uses configuration files (extension: .conf) to set the computation parameters. For a quick overview, see example.conf that is an example of such a configuration file. A complete description of the format is given in the Configuration part.

Algorithm

As for now, there is one possible algorithm: augkf (Augmented state Kalman Filter implemented in Python). Another algorithm zonkf (A variation of AugKF with zonal coefficients treated apart) has been archived. See the presentations of AugKF and ZonKF for more information.

Realisations

The algorithm implemented in pygeodyn is an ensemble Kalman filter: a Kalman filter that works on an ensemble of realisations of the core state. These can be considered as random draws around a mean core state. For more information on the ensemble Kalman filter, see Evensen G. (2009) Data Assimilation: The Ensemble Kalman Filter.

Each realisation is forecast and analysed on its own (allowing parallel computation: see the Example with MPI below). The parameter -m sets the number of realisations (called hereafter nb_reals) to use and therefore has a huge impact on the runtime of the algorithm and on the quality of the results.

For an AR-1 process, 20 realisations is a minimum and a value greater than 100 is advised if higher resolution is needed.

For an AR-3 process, 100 realisations is a minimum and a value greater than 300 is advised if higher resolution is needed; and Principal Component Analysis (PCA) is needed (N_pca_u value of 100 is advised).

Output files

Output folder

When a new computation is launched, pygeodyn creates a folder of name -cname (default: Current_computation) at the location given in -path (default: ~/pygeodyn_results/). All result files will be created in this output folder (including the log file if -l was given).

Result files

HDF5 format

Pygeodyn saves the specified results in the specified format (see Configuration) at each time iteration in an hdf5 file of name -cname. HDF5 files can be read and manipulated with a variety of tools including the Python package h5py used in pygeodyn. It is a Hierarchical Data Format (HDF) where the data is organized along a directory-like structure.

The possible groups are:

  • analysed: Group containing the core states saved after each analysis

  • forecast: Group containing the core states saved after each forecast

  • computed: Group containing the core states on which computation steps are performed. Equal to forecast except at analysis times where it is replaced by the analysed data that is used for the next forecast.

  • misfits : Group containing the misfits saved after each analysis

Each of these groups contains at least:

  • ER: Gauss coefficients of the subgrid errors (dim: nb_reals x nb_times x Nsv)

  • MF: Gauss coefficients of the magnetic field (dim: nb_reals x nb_times x Nb)

  • SV: Gauss coefficients of the secular variation (dim: nb_reals x nb_times x Nsv)

  • U: Gauss coefficients of the core flow at the CMB (dim: nb_reals x nb_times x Nu2)

  • times: Array of dates stored with decimal representation (dim: nb_times)

If -shear is 1 then analysed group also contains S : Gauss coefficients of the shear at the CMB (dim: nb_reals x nb_times x Nu2)

If AR_type (config parameter) is AR3 then analysed group also contains:

  • dUdt: Gauss coefficients of the first time derivative of U (dim: nb_reals x nb_times x Nu2)

  • d2Udt2: Gauss coefficients of the second time derivative of U (dim: nb_reals x nb_times x Nu2)

  • dERdt: Gauss coefficients of the first time derivative of ER (dim: nb_reals x nb_times x Nsv)

  • d2ERdt2: Gauss coefficients of the second time derivative of ER (dim: nb_reals x nb_times x Nsv)

The variable nb_times depends on the group. It is equal to:

  • nb_analyses for analysed and misfits

  • nb_forecasts for computed and forecast

Examples

  • Minimal example:
    python3 run_algo.py -conf augkf.conf
    

    This will launch a computation using the parameters stored in augkf.conf and with other arguments set to their default values.

  • Example with all arguments given:
    python3 run_algo.py -conf augkf.conf -m 5 -algo 'augkf' -seed 15 -path './results/' -cname 'AugKF' -v 4 -l 'log.txt'
    

    This will launch an augkf computation on 5 realisations using the parameters stored in augkf.conf and a seed of 15 to initialise random values. The results will be saved in ./results/AugKF, as will be the log file log.txt at ERROR level.

  • Example with MPI

    Forecasts can be performed in parallel if using mpi4py. For this, use the mpiexec command by setting the number of process. The other arguments can be set as before.

    mpiexec -np 4 python3 -m mpi4py run_algo.py -conf augkf.conf
    mpiexec -np 2 python3 -m mpi4py run_algo.py -v 3 -conf augkf.conf -path './results/' -m 3 -cname 'AugKF' -algo 'augkf'
    

To go further

pygeodyn offers the capabilities to run supplied algorithms on custom priors (from which covariance matrices derive) and/or custom observations. This can be done by creating new prior and observations types by supplying their reading methods.