#############################
Using the ``run_algo`` script
#############################

Presentation
============
The easiest way to use the pygeodyn algorithms is through the ``run_algo`` script. This script implements data assimilation algorithms by taking care of chaining and looping steps, and of numerous handy things such as the uses of multiple processes with mpi4py, the saving of output files and of logs.

The script is organised as follows:

* Initialisation of the algorithm (reading prior, observations, computing matrices,...)
* Initialisation of the CoreState (``init_core_state``: *normal* or *from_file*)
* First analysis at ``t_start_analysis`` (mandatory parameter)
* Last analysis at ``t_end_analysis`` (mandatory parameter)
* Loop on forecast times:
    * Forecast of all realisations
    * If at an analysis time, observation data is found, analysis of the ensemble of realisations is performed
    * Set up analysed data for forecast
* If ``-shear`` is 1, the shear at the CMB is computed from the analysis results (only at analysis times)
* Saving the desired results with the desired format (see `Configuration`_)

Usage
=====
`run_algo.py` can be run from the console as any Python script

.. code-block:: bash

    python3 run_algo.py -conf augkf.conf

and accepts the following arguments:


* Computation parameters
    * ``-conf``: Path to the `configuration file`_ (no default value: must be given).
    * ``-algo``: the name of the `algorithm`_ to use (default: ``augkf``).
    * ``-m``: the number of `realisations`_ to consider (default: 20).
    * ``-shear``: 0 don't compute shear, 1 compute shear (default: 0).
    * ``-seed``: a number used for the generation of random quantities. Set this to get reproducible results (default: generate a new one at each execution).
* `Output files`_
    * ``-path``: path where the folder results will be created (default: ``~/pygeodyn_results/``).
    * ``-cname``: name of the computation. Used to create a folder in ``-path`` where the results will be saved (default: ``Current_computation``).
* Logging
    * ``-v``: level of the logging: 1 (DEBUG), 2 (INFO, default), 3 (WARNING), 4 (ERROR), 5 (CRITICAL).
    * ``-l``: name of the log file that will be created in the result folder (default: display logs in the console).

Configuration file
-------------------
pygeodyn uses configuration files (extension: ``.conf``) to set the computation parameters. For a quick overview, see `example.conf <https://gricad-gitlab.univ-grenoble-alpes.fr/Geodynamo/pygeodyn/blob/master/example.conf>`_ that is an example of such a configuration file. A complete description of the format is given in the `Configuration`_ part.

Algorithm
---------
As for now, there is one possible algorithm: **augkf** (Augmented state Kalman Filter implemented in Python). Another algorithm **zonkf** (A variation of AugKF with zonal coefficients treated apart) has been archived. See the presentations of `AugKF`_ and `ZonKF`_ for more information.

Realisations
------------

The algorithm implemented in pygeodyn is an ensemble Kalman filter: a Kalman filter that works on an ensemble of realisations of the core state. These can be considered as random draws around a mean core state. For more information on the ensemble Kalman filter, see Evensen G. (2009) *Data Assimilation: The Ensemble Kalman Filter*.

Each realisation is forecast and analysed on its own (allowing parallel computation: see the *Example with MPI* below). The parameter `-m` sets the number of realisations (called hereafter *nb_reals*) to use and therefore has a huge impact on the runtime of the algorithm and on the quality of the results. 

For an AR-1 process, 20 realisations is a minimum and a value greater than 100 is advised if higher resolution is needed.

For an AR-3 process, 100 realisations is a minimum and a value greater than 300 is advised if higher resolution is needed; and Principal Component Analysis (PCA) is needed (``N_pca_u`` value of 100 is advised).

Output files
------------
Output folder
^^^^^^^^^^^^^
When a new computation is launched, pygeodyn creates a folder of name ``-cname`` (default: ``Current_computation``) at the location given in ``-path`` (default: ``~/pygeodyn_results/``). All result files will be created in this output folder (including the log file if ``-l`` was given).

Result files
^^^^^^^^^^^^^
HDF5 format
'''''''''''
Pygeodyn saves the specified results in the specified format (see `Configuration`_) at each time iteration in an hdf5 file of name ``-cname``. HDF5 files can be read and manipulated with a variety of tools including the Python package ``h5py`` used in pygeodyn. It is a Hierarchical Data Format (HDF) where the data is organized along a directory-like structure.

The possible groups are:

- **analysed**: Group containing the core states saved after each analysis
- **forecast**: Group containing the core states saved after each forecast
- **computed**: Group containing the core states on which computation steps are performed. Equal to **forecast** except at analysis times where it is replaced by the analysed data that is used for the next forecast.
- **misfits** : Group containing the misfits saved after each analysis

Each of these groups contains at least:

- **ER**: Gauss coefficients of the subgrid errors (dim: *nb_reals* x *nb_times* x *Nsv*)
- **MF**: Gauss coefficients of the magnetic field (dim: *nb_reals* x *nb_times* x *Nb*)
- **SV**: Gauss coefficients of the secular variation (dim: *nb_reals* x *nb_times* x *Nsv*)
- **U**: Gauss coefficients of the core flow at the CMB (dim: *nb_reals* x *nb_times* x *Nu2*)
- **times**: Array of dates stored with decimal representation (dim: *nb_times*)

If ``-shear`` is 1 then **analysed** group also contains **S** : Gauss coefficients of the shear at the CMB (dim: *nb_reals* x *nb_times* x *Nu2*)

If ``AR_type`` (config parameter) is AR3 then **analysed** group also contains:

- **dUdt**: Gauss coefficients of the first time derivative of **U** (dim: *nb_reals* x *nb_times* x *Nu2*)
- **d2Udt2**: Gauss coefficients of the second time derivative of **U** (dim: *nb_reals* x *nb_times* x *Nu2*)
- **dERdt**: Gauss coefficients of the first time derivative of **ER** (dim: *nb_reals* x *nb_times* x *Nsv*)
- **d2ERdt2**: Gauss coefficients of the second time derivative of **ER** (dim: *nb_reals* x *nb_times* x *Nsv*)

The variable *nb_times* depends on the group. It is equal to:

* *nb_analyses* for **analysed** and **misfits**
* *nb_forecasts* for **computed** and **forecast**


Examples
========
* Minimal example:
    .. code-block:: bash

        python3 run_algo.py -conf augkf.conf

    This will launch a computation using the parameters stored in *augkf.conf* and with other arguments set to their default values.

* Example with all arguments given:
    .. code-block:: bash

        python3 run_algo.py -conf augkf.conf -m 5 -algo 'augkf' -seed 15 -path './results/' -cname 'AugKF' -v 4 -l 'log.txt'

    This will launch an *augkf* computation on 5 realisations using the parameters stored in *augkf.conf* and a seed of 15 to initialise random values. The results will be saved in ``./results/AugKF``, as will be the log file ``log.txt`` at ERROR level.

* Example with MPI
    Forecasts can be performed in parallel if using `mpi4py <https://mpi4py.readthedocs.io/en/stable/>`_. For this, use the ``mpiexec`` command by setting the number of process. The other arguments can be set as before.

    .. code-block:: bash

        mpiexec -np 4 python3 -m mpi4py run_algo.py -conf augkf.conf
        mpiexec -np 2 python3 -m mpi4py run_algo.py -v 3 -conf augkf.conf -path './results/' -m 3 -cname 'AugKF' -algo 'augkf'


To go further
-------------
pygeodyn offers the capabilities to run supplied algorithms on custom priors (from which covariance matrices derive) and/or custom observations. This can be done by creating `new prior and observations types`_ by supplying their reading methods.

.. _AugKF: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/augkf.html
.. _ZonKF: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/zonkf.html
.. _init_algorithm: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.run.html
.. _ComputationConfig: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.inout.config.html
.. _Configuration: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/configuration.html
.. _CoreState: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/usage_corestate.html
.. _new prior and observations types: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/usage_new_types.html