DA-DAPPER 1.2.2 on PyPI

DAPPER is a set of templates for benchmarking the performance of data assimilation (DA) methods. The numerical experiments provide support and guidance for new developments in DA. The typical set-up is a synthetic (twin) experiment, where you specify a dynamic model and an observational model, and use these to generate a synthetic truth (multivariate time series), and then estimate that truth given the models and noisy observations.

Getting started

Install, then read, run and try to understand examples/basic_{1,2,3}.py. Some of the examples can also be opened in Jupyter, and thereby run in the cloud (i.e. without installation, but requiring Google login): . This screencast provides an introduction. The documentation includes general guidelines and the API, but for any serious use you will want to read and adapt the code yourself. If you use it in a publication, please cite, e.g., The experiments used (inspiration from) DAPPER [ref], version 1.6.0, where [ref] points to . Lastly, for an introduction to DA theory also using Python, see these tutorials.

Highlights

DAPPER enables the numerical investigation of DA methods through a variety of typical test cases and statistics. It (a) reproduces numerical benchmarks results reported in the literature, and (b) facilitates comparative studies, thus promoting the (a) reliability and (b) relevance of the results. For example, this figure is generated by examples/basic_3.py, making use of built-in tools for experiment and result management, reproduces figure 5.7 of these lecture notes.

DAPPER is (c) open source, written in Python, and (d) focuses on readability; this promotes the (c) reproduction and (d) dissemination of the underlying science, and makes it easy to adapt and extend.

It also illustrates how to parallelise ensemble forecasts (e.g. the QG model), local analyses (e.g. the LETKF), and independent experiments (e.g. examples/basic_3.py). It comes with a battery of diagnostics and statistics. These all get averaged over subdomains (e..g "ocean" and "land") and then in time. Confidence intervals are computed, including correction for auto-correlations, and used for uncertainty quantification, and significant digits printing. Several diagnostics are included in the on-line "liveplotting" illustrated below, which may be paused for further interactive inspection.

In summary, DAPPER is well suited for teaching and fundamental DA research. Also see its drawbacks.

Installation

Successfully tested on Linux/Mac/Windows.

Prerequisite: Python>=3.9

If you're an expert, setup a python environment however you like. Otherwise: Install Anaconda, then open the Anaconda terminal and run the following commands:

conda create --yes --name dapper-env python=3.9
conda activate dapper-env
python --version

Ensure the printed version is 3.9 or more.
Keep using the same terminal for the commands below.

Install

Either: Install for development (recommended)

Do you want the DAPPER code available to play around with? Then

Download and unzip (or git clone) DAPPER.
Move the resulting folder wherever you like,
and cd into it (ensure you're in the folder with a setup.py file).
pip install -e '.[dev]'
You can omit [dev] if you don't need to do serious development.

Or: Install as library

Do you just want to run a script that requires DAPPER? Then

If the script comes with a requirements.txt file, then do
pip install -r path/to/requirements.txt.
If not, hopefully you know the version of DAPPER needed. Run
pip install dapper==1.5.1 to get version 1.5.1 (as an example).

Finally: Test the installation

You should now be able to do run your script with python path/to/script.py.
For example, if you are in the DAPPER dir,

python examples/basic_1.py

PS: If you closed the terminal (or shut down your computer), you'll first need to run conda activate dapper-env

DA methods

Method	Literature reproduced
EnKF ¹	Sakov08, Hoteit15, Grudzien2020
EnKF-N	Bocquet12, Bocquet15
EnKS, EnRTS	Raanes2016
iEnKS / iEnKF / EnRML / ES-MDA ²	Sakov12, Bocquet12, Bocquet14
LETKF, local & serial EAKF	Bocquet11
Sqrt. model noise methods	Raanes2014
Particle filter (bootstrap) ³	Bocquet10
Optimal/implicit Particle filter ³	Bocquet10
NETF	Tödter15, Wiljes16
Rank histogram filter (RHF)	Anderson10
4D-Var
3D-Var
Extended KF
Optimal interpolation
Climatology

¹: Stochastic, DEnKF (i.e. half-update), ETKF (i.e. sym. sqrt.). Serial forms are also available.
Tuned with inflation and "random, orthogonal rotations".
²: Also supports the bundle version, and "EnKF-N"-type inflation.
³: Resampling: multinomial (including systematic/universal and residual).
The particle filter is tuned with "effective-N monitoring", "regularization/jittering" strength, and more.

For a list of ready-made experiments with suitable, tuned settings for a given method (e.g. the iEnKS), use:

grep -r "xp.*iEnKS" dapper/mods

Test cases (models)

Model	Lin	TLM**	PDE?	Phys.dim.	State len	Lyap≥0	Implementer
Id	Yes	Yes	No	N/A	*	0	Raanes
Linear Advect. (LA)	Yes	Yes	Yes	1d	1000 *	51	Evensen/Raanes
DoublePendulum	No	Yes	No	0d	4	2	Matplotlib/Raanes
Ikeda	No	Yes	No	0d	2	1	Raanes
LotkaVolterra	No	Yes	No	0d	5 *	1	Wikipedia/Raanes
Lorenz63	No	Yes	"Yes"	0d	3	2	Sakov
Lorenz84	No	Yes	No	0d	3	2	Raanes
Lorenz96	No	Yes	No	1d	40 *	13	Raanes
Lorenz96s	No	Yes	No	1d	10 *	4	Grudzien
LorenzUV	No	Yes	No	2x 1d	256 + 8 *	≈60	Raanes
LorenzIII	No	No	No	1d	960 *	≈164	Raanes
Vissio-Lucarini 20	No	Yes	No	1d	36 *	10	Yumeng
Kuramoto-Sivashinsky	No	Yes	Yes	1d	128 *	11	Kassam/Raanes
Quasi-Geost (QG)	No	No	Yes	2d	129²≈17k	≈140	Sakov

*: Flexible; set as necessary
**: Tangent Linear Model included?

The models are found as subdirectories within dapper/mods. A model should be defined in a file named __init__.py, and illustrated by a file named demo.py. Most other files within a model subdirectory are usually named authorYEAR.py and define a HMM object, which holds the settings of a specific twin experiment, using that model, as detailed in the corresponding author/year's paper. A list of these files can be obtained using

find dapper/mods -iname '[a-z]*[0-9]*.py'

Some files contain settings used by several papers. Moreover, at the bottom of each such file should be (in comments) a list of suitable, tuned settings for various DA methods, along with their expected, average rmse.a score for that experiment. As mentioned above, DAPPER reproduces literature results. You will also find results that were not reproduced by DAPPER.

Similar projects

DAPPER is aimed at research and teaching (see discussion up top). Example of limitations:

It is not suited for very big models (>60k unknowns).
Non-uniform time sequences.

The scope of DAPPER is restricted because

Moreover, even straying beyond basic configurability appears unrewarding when already building on a high-level language such as Python. Indeed, you may freely fork and modify the code of DAPPER, which should be seen as a set of templates, and not a framework.

Also, DAPPER comes with no guarantees/support. Therefore, if you have an operational or real-world application, such as WRF, you should look into one of the alternatives, sorted by approximate project size.

Name	Developers	Purpose (approximately)
DART	NCAR	General
PDAF	AWI	General
JEDI	JCSDA (NOAA, NASA, ++)	General
OpenDA	TU Delft	General
EMPIRE	Reading (Met)	General
ERT	Statoil	History matching (Petroleum DA)
PIPT	CIPR	History matching (Petroleum DA)
MIKE	DHI	Oceanographic
OAK	Liège	Oceanographic
Siroco	OMP	Oceanographic
Verdandi	INRIA	Biophysical DA
PyOSSE	Edinburgh, Reading	Earth-observation DA

Below is a list of projects with a purpose more similar to DAPPER's (research in DA, and not so much using DA):

Name	Developers	Notes
DAPPER	Raanes, Chen, Grudzien	Python
SANGOMA	Conglomerate*	Fortran, Matlab
hIPPYlib	Villa, Petra, Ghattas	Python, adjoint-based PDE methods
FilterPy	R. Labbe	Python. Engineering oriented.
DASoftware	Yue Li, Stanford	Matlab. Large inverse probs.
Pomp	U of Michigan	R
EnKF-Matlab	Sakov	Matlab
EnKF-C	Sakov	C. Light-weight, off-line DA
pyda	Hickman	Python
PyDA	Shady-Ahmed	Python
DasPy	Xujun Han	Python
DataAssim.jl	Alexander-Barth	Julia
DataAssimilationBenchmarks.jl	Grudzien	Julia, Python
EnsembleKalmanProcesses.jl	Clim. Modl. Alliance	Julia, EKI (optim)
Datum	Raanes	Matlab
IEnKS code	Bocquet	Python

The EnKF-Matlab and IEnKS codes have been inspirational in the development of DAPPER.

*: AWI/Liege/CNRS/NERSC/Reading/Delft

Contributing

Issues and Pull requests

Do not hesitate to open an issue, whether to report a problem or ask a question. It may take some time for us to get back to you, since DAPPER is primarily a volunteer effort. Please start by perusing the documentation and searching the issue tracker for similar items.

Pull requests are very welcome. Examples: adding a new DA method, dynamical models, experimental configuration reproducing literature results, or improving the features and capabilities of DAPPER. Please keep in mind the intentional limitations and read the developers guidelines.

Contributors

Patrick N. Raanes, Yumeng Chen, Colin Grudzien, Maxime Tondeur, Remy Dubois

DAPPER is developed and maintained at NORCE (Norwegian Research Institute) and the Nansen Environmental and Remote Sensing Center (NERSC), in collaboration with the University of Reading, the UK National Centre for Earth Observation (NCEO), and the Center for Western Weather and Water Extremes (CW3E).