GrainLearning

A Bayesian uncertainty quantification toolbox for discrete and continuum numerical models of granular materials


Keywords
Bayesian, inference, Uncertainty, quantification, Non-parametric, Gaussian, mixture, DEM, Constitutive, modeling, Granular, materials, bayesian-inference, low-discrepancy-sequences, mixture-models, parameter-identification, sequential-monte-carlo, uncertainty-quantification
License
GPL-2.0
Install
pip install GrainLearning==2.0.3

Documentation

Welcome to GrainLearning!

Authors Hongyang Cheng, Philipp Hartmann, and Klaus Thoeni
Contacts h.cheng@utwente.nl
Version 1.0

Description

GrainLearning is a Python-based Bayesian Calibration tool for estimating parameter uncertainties in geomechanical models (currently only particle-scale and elasto-plasticity models are tested). It uses the recursive Bayes' rule to quantify the evolution of the probability distribution of parameters over time or a load history. Samples are drawn either uniformly, assuming no prior knowledge, or a proposal distribution that is iteratively learned. After having enough statistics per iteration (effective sample size), We train and utilize nonparametric Gaussian mixture models as proposal distributions to resample our parameter space. The mixture model trained at the end of each iteration guides the resampling to be asymptotically close to global optima, and thus greatly reduces the computational cost compared with conventional approaches.

GrainLearning is developed to serve the ever-increasing need for fast and accurate estimation of particle- or microstructure-scale parameters in discrete element method (DEM) simulations of granular materials. For that purpose, GrainLearning is implemented to work seamlessly with Yade, an open-source discrete numerical modeling framework. However, integrating GrainLearning with other codes (e.g., MercuryDPM and methods not limited to DEM) is rather straight forward. One could either pass parameter samples generated by GrainLearning from Python to (the executable of) your model or implement your own runModel module in Python.

Introduction

In particle simulations, plastic deformation at the macro scale arises from contact sliding in the tangential and rolling/twisting directions and the irrecoverable change of the microstructure. Because the parameters relevant to these microscopic phenomena are not directly measurable in a laboratory, calibration of DEM models is generally treated as an inverse problem using "inverse methods", ranging from trial and error to sophisticated statistical inference. Solving an inverse problem that involves nonlinearity and/or discontinuity in the forward model (DEM) is very challenging. Furthermore, because of the potentially large computational cost for running the simulations, the "trials" have to be selected with an optimized strategy to boost the efficiency.

In GrainLearning, the probability distribution of model states and parameters, conditioned on given reference data (termed "posterior distribution") can be approximated by sequential Monte Carlo methods. To efficiently sample parameter space, a multi-level (re)sampling algorithm is utilized. For the first iteration of Bayesian filtering, the parameter values are uniformly sampled from quasi-random numbers, which leads to conventional sequential quasi-Monte Carlo (SQMC) filtering. For the subsequent iterations, new parameter values are drawn from the posterior distribution updated by the previous iteration. Iterative Bayesian filtering allows us to sample near potential posterior modes in parameter space, with an increasing sample density over the iterations, until the ensemble predictions of the model parameters converge.

Bayesian filtering

The aim of Bayesian filtering is to estimate the probability distribution of the augmented model state that consist of parameters (assuming they vary in time) and model prediction , conditioned on all available data . From this conditional probability (termed "posterior") distribution, , the moments such as the means and variances can be computed as

where describes an arbitrary quantity of interest as a function of the augmented model state, and is the number of samples drawn from a proposal distribution. Each sample is associated with an importance weight that corrects the Monte Carlo approximation of the full posterior distribution from the proposal distribution, using the recursive Bayes' rule. Note that the importance weights are normalized to sum to unity at each time step.

Iterative sampling using nonparametric Gaussian mixtures

Because the posterior distribution of the model parameters is typically unknown, a non-informative proposal density is used to uniformly explore the parameter space in the first iteration (). The sampling is done with quasi-random numbers (defined an initial guess of the upper and lower bounds of parameters), which leads to the so-called sequential quasi-Monte Carlo (SQMC) filter with an number of samples scales with . Although a uniform initial sampling is unbiased, it is very inefficient and ensured to degenerate as time approaches infinity. To avoid circumvent weight degeneracy, the inverse problem is solved again from time 0 to T with new samples drawn from a more sensible proposal distribution and reinitialized weights.

The posterior distribution of the parameters resulting from the previous Bayesian filtering is chosen as the proposal distribution; the parameter samples are drawn from . For each iteration with , the proposal density is constructed by training a nonparametric Gaussian mixture with the samples and importance weights from the previous iteration. Because the closer to a posterior distribution mode the higher the sample density, resampling from the repeatedly updated proposal density allows to zoom into highly probable parameter subspace in very few iterations. The iterative (re)sampling scheme brings three major advantages to the Bayesian filtering framework:

  1. The posterior distribution is iteratively estimated with an increasing resolution on the posterior landscape.
  2. The multi-level sampling algorithm keeps allocating model evaluations in parameter subspace where the posterior probabilities are expected to be high, thus significantly improving computational efficiency.
  3. Resampling that takes place between two consecutive iterations can effectively overcome weight degeneracy problem while keeping sample trajectories intact within the time/load history.

Code and data structure

The SMC class

Capabilities

Data directories and formatting

Reference or observation data

GrainLearning stores your reference/observation data as a dictionary-type variable in the class smc. Therefore, the user needs to provide the keys to the data sequence for the control and output of a model evaluation (e.g., axial strain and axial stress in a triaxial test). These two (lists of) keys are used to extract the corresponding datasets from the data file of a model evaluation.

# key for simulation control
obsCtrl = 'u'
# key for output data
simDataKeys = ['f']

Tutorials

The tutorials are provided with the goal to work with or post-process simulation data from Yade. Model evaluations here are therefore the evaluations of a Yade-DEM model. In the case of DEM modeling of granular soils, relevant parameters could be Young's modulus, friction coefficient, Poisson's ratio, rolling stiffness, and rolling friction, etc. of a soil particle, as well as structural parameters like a particle size distribution parameterized by its moments. In the following, we will walk you through the steps of Bayesian calibration using GrainLearning. The DEM model here is kept as simple as possible such that the calibration should finish within a few minutes on a normal laptop.

Two-particle collision in the normal direction

For the sake of brevity, we shall first use the simplest DEM simulation, i.e., two particles colliding in the normal direction, described by the Hertz-Mindlin contact law. Assuming perfectly elastic collision, the number of parameters reduces to only two, namely, particle-scale Young's modulus $E$ and Poisson's ratio $\nu$ (weak dependency). In the following, we will demonstrate the capability of GrainLearning using such a simple DEM model. The tutorial CaliCollision.py is set up as follows.

  1. Create a synthetic dataset from the "two-particle" simulation (already in GrainLearning/collision.dat) as the ground truth

    ObsData = np.loadtxt('collision.dat')
    
  2. Add Gaussian noise to the ground truth

    noise = np.random.normal(0, 0.1 * max(ObsData[1]), len(ObsData[1]))
    
  3. Run GrainLearning iteratively until the error is sufficiently small and see if the correct parameter values are recovered

    # iterate the problem
    while smcTest.sigma > 1.0e-2 and iterNO < maxNumOfIters:
        # initialize the weights
        smcTest.initialize(maxNumComponents, priorWeight, covType=covType)
        # run sequential Monte Carlo
        ips, covs = smcTest.run(iterNO=iterNO)
    

The driver script

To run the tutorial execute the driver script 'CaliCollision.py'

ipython3 CaliCollision.py

You will then be asked to give your initial guess of the upper limit of the normalized covariance $\sigma$. You could think of $\sigma$ as the error between the ground truth and your model prediction which could be 100% or even larger. In fact, GrainLearning tries to find the $\sigma$ that results in an effective sample size equal to a user-specified value, via the root-finding algorithm of the scipy library.

# user-defined parameter: upper limit of the normalized covariance coefficient
sigma = float(input("Give an initial guess of the upper limit of normalized covariance: "))
# target effective sample size specified by the user
ess = 0.3
obsWeights = [1.0]

Before entering the iteration loop of Bayesian filtering, provide the names of the unknown parameters and their upper and lower limits. They will be used only in the first iteration for generating uniformly distributed parameter samples, using the low-discrepancy sequence of Ghalton (see the advantage of low-discrepancy sequences over random numbers in here).

# give ranges of parameter values (E, \nu)
paramNames = ['E', 'nu']
numParams = len(paramNames)
# use uniform sampling for the first iteration
paramRanges = {'E': [10e9, 100e9], 'nu': [0.1, 0.5]}

For iterations after the first, parameter samples are drawn from the Gaussian mixture model that is trained at the end of the previous iteration. Although the Gaussian mixture model used here is nonparametric, it still requires hyper-parameters which are rough estimates of what the values of the actual parameters should be (e.g., the maximum number of Gaussian components is a parameter which itself is estimated during the training). For the technical details on the maximum of Gaussian components and the prior weight (weight_concentration_prior), see the documentation of the BayesianGaussinMixture class of scikit-learn. For demonstration purpose, we choose 10 for the sample size of a 1D problem (one unknown parameter) which in our experience seems to be enough to give reasonable results.

# set number of samples per iteration (e.g., num1D * N * logN for quasi-Sequential Monte Carlo)
numSamples1D = 10  
numSamples = int(numSamples1D * numParams * log(numParams))
# set the maximum Gaussian components and prior weight
maxNumComponents = int(numSamples / 10)
priorWeight = 1. / maxNumComponents
covType = 'tied'

GrainLearning can be used in two different ways:

  1. Interactive modes that allows iterative Bayesian calibration, with

    1.1. Yade fully integrated with GrainLearning in a Python environment (import Yade as a Python library)

    1.2. Yade-batch jobs sent from Python to the command line and the data read into GrainLearning after simulations are finished

  2. Stand-alone mode that only generates samples or post-processes pre-run simulation data to obtain parameter distributions

The following subsections will demonstrate the usage of two interactive and one stand-alone mode. Note that for evaluating numerical models other than Yade-DEM, one could use either the interactive mode with the model run outside Python or the stand-alone mode that handles one iteration at a time.

Two interactive modes

Mode I: Run model evaluations (DEM) within GrainLearning (Python)

The most efficient way of using GrainLearning is to wrap everything within a Python environment. This allows keeping all simulation data in memory without the necessity of writing and reading them externally, and thus less prone to errors like not matching parameter samples and simulation data. To our knowledge, this interactive mode is currently only possible with Yade. Furthermore, a unified Python environment also allows the possibility to synchronize Bayesian calibration and Yade in time, which means the user could pause the calibration at certain times, take a look at the probability distribution of parameters, see whether the statistics converge or not, and then decide on whether proceed in time or jump directly to the resampling step.

This interactive mode can be switched on simply by setting the variables runYadeInGL=True and standAlone=False.

# interactive mode with Yade running inside GrainLearning
smcTest = smc(sigma, ess, obsWeights,
              obsCtrl=obsCtrl, simDataKeys=simDataKeys, obsFileName='collisionObs.dat',
              loadSamples=False, runYadeInGL=True, standAlone=False)

# generate the initial parameter samples with a low-discrepancy sequence
smcTest.initParams(paramNames, paramRanges, numSamples)

In order to use Yade within GrainLearning, some modification has to be made on the Yade side.

  • Load Yade as a Python library. See details in here

      # yadeimport.py is generated by `ln yade-versionNo yadeimport.py`
      # add the directory where Yade is install
      sys.path.append(<directory where Yade is installed>)
      # import external dependencies
      from yadeimport import *
    
  • Convert a Yade driver script (e.g., Collision.py) to functions (YadeSim.py) that can be accessed from GrainLearning and managed by the multiprocessing module of Python

Mode II: Run model evaluations (DEM) outside GrainLearning (Python)

Perhaps, this interactive mode is the easiest way to use GrainLearning and Yade (or other numerical models) together. The steps to do this include

  1. create a parameter table from GrainLearning,
  2. run Yade in batch mode using the table,
  3. and return to GrainLearning once the simulations are finished.

This is essentially the stand-alone mode plus Python sending Yade-batch jobs to the command line. Note for non-Yade users step 2 can be replaced by a string that can execute the model evaluations using the parameter table.

To switch on this interactive mode, simply set runYadeInGL=False and standAlone=False. Of course, the user has to provide the Yade script which in our case is yadeScript='Collision.py', the directory where the simulation data is stored yadeDataDir='Collision', and the Yade command, e.g., yadeVersion='yade-batch'. simName should also be provided to make the search for simulation data in yadeDataDir easier.

# interactive mode with Yade running outside GrainLearning
smcTest = smc(sigma, ess, obsWeights,
              yadeVersion='yadedaily-batch', yadeScript='Collision.py', yadeDataDir='Collision', threads=threads,
              obsCtrl=obsCtrl, simDataKeys=simDataKeys, simName='2particle', obsFileName='collisionObs.dat',
              loadSamples=False, runYadeInGL=False, standAlone=False)

# generate the initial parameter samples with a low-discrepancy sequence
smcTest.initParams(paramNames, paramRanges, numSamples)

Start iterative Bayesian calibration with a user-defined parameter table

The previous two sections assume that you have no prior knowledge of the probability distribution of the parameters; you would like to start from a state of ignorance, i.e., uniform sampling. However, if you have some prior knowledge, you might want to start iterative Bayesian calibration with your own parameter samples that are not drawn uniformly. Such samples can be drawn either from a distribution provided externally by another software or from the previous runs of GrainLearning. We call the distribution from which these samples are drawn a proposal distribution. In fact, the key to Bayesian calibration is how the proposal distribution is chosen such that it approximates (at least part of) the probability landscape over your model parameters, conditioned on your reference/observation data.

To start off with parameter samples known a-priori, you only need the following two things:

  1. Turn the flag loadSamples=True when creating the smc object. You could use either interactive mode I or II, depending on whether your model can be evaluated within GrainLearning (Python).

     # interactive mode with Yade running inside GrainLearning
     smcTest = smc(sigma, ess, obsWeights,
                   yadeDataDir='Collision', threads=threads,
                   obsCtrl=obsCtrl, simDataKeys=simDataKeys, obsFileName='collisionObs.dat',
                   loadSamples=True, runYadeInGL=True, standAlone=False)
    

    or

     # interactive mode with Yade running outside GrainLearning
     smcTest = smc(sigma, ess, obsWeights,
                   yadeVersion='yadedaily-batch', yadeScript='Collision.py', yadeDataDir='Collision', threads=threads,
                   obsCtrl=obsCtrl, simDataKeys=simDataKeys, simName='2particle', obsFileName='collisionObs.dat',
                   loadSamples=True, runYadeInGL=False, standAlone=False)
    
  2. Specify the text file where your parameter samples are or will be stored paramsFile='<File name of the parameter table>.txt'.

     # load the initial parameter samples externally 
     smcTest.initParams(paramNames, paramRanges, numSamples, paramsFile='smcTableUser.txt')
    

What if I have already obtained simulation data using my own parameter samples?

If the simulation data are already available, you surely do not want to redo the calculations. What you can do is to temporarily switch to the stand-alone mode, only for the first iteration, and then switch back to one of the interactive modes after the first iteration is finished. The rest is the same as in either interactive mode. Fortunately, we already implement the switch between interactive and stand-alone modes, depending on whether simulation data are available in a user-defined subdirectory under smc.yadeDataDir. All you need to do is specify the name of that subdirectory subDir when initializing parameter samples.

# load the initial parameter samples externally
smcTest.initParams(paramNames, paramRanges, numSamples, paramsFile='smcTableUser.txt', subDir='iter0')

It is crucial to ensure that the simulation data read in from the files in subDir match correctly with the parameter samples in paramsFile. Whether they match or not is checked in the member function checkParamsError() of the smc class. In order to make smc.checkParamsError() work, your data files will have to be named as <simName>_<key>_<param0>_<param1>_..._<paramN>.txt, where the simulation name, NO. of model evaluation, and parameter values are kept in the file name. We ask the user to follow this convention so that GrainLearning can check this potential error, and recover your parameter samples even if your parameter table is lost or overwritten.

Extract parameter samples from the names of simulation files

If your parameter table is indeed lost or overwritten accidentally, GrainLearning can still extract the parameter samples from the names of your data files. The extracted parameter samples will be saved in the file paramsFile that was not there before running GrainLearning. The default name 'TableFromFileNames.txt' will be used if you do not specify paramsFile explicitly.

# load the initial parameter samples externally
smcTest.initParams(paramNames, paramRanges, numSamples, paramsFile='smcTableUser.txt', subDir='iter0')

The search for simulation data files could be unsuccessful when simName is set incorrectly. You will be asked to update the simName interactively in the function smc.getYadeDataFiles() until the data files are foun.

Stand-alone mode

As mentioned before, the stand-alone mode handles one iteration at a time. In this mode, you can skip model evaluations if you only wish to create an initial set of uniformly distributed samples. You can also use GrainLearning as a post-process tool to estimate parameter distribution using pre-run simulation data stored in subDir.

Create an initial set of uniformly distributed samples

To generate an initial set of samples using a low-discrepancy sequence, you only need the code pieces before the iteration loop. The relevant flags are loadSamples=False and standAlone=True. After smc.initParams(...), an initial parameter table named smcTable0.txt will be created.

# generate the initial parameter samples with a low-discrepancy sequence
smcTest.initParams(paramNames, paramRanges, numSamples)

Use GrainLearning as a post-processing tool

To get a feeling of the stand-alone mode, use the script CaliCollision_StandAlone.py which is shortened from CaliCollision.py without the loop of iterative Bayesian filtering. Differently from the subsection above, one would need to set loadSamples=True and specify the parameter table and data subdirectory, in order to proceed until the resampling step.

# stand-alone mode: use GrainLearning as a post-process tool using pre-run DEM data
smcTest = smc(sigma, ess, obsWeights,
              yadeDataDir='Collision', threads=threads,
              obsCtrl=obsCtrl, simDataKeys=simDataKeys, simName='2particle', obsFileName='collisionObs.dat',
              loadSamples=True, runYadeInGL=False, standAlone=True)

# load or generate the initial parameter samples
smcTest.initParams(paramNames, paramRanges, numSamples, paramsFile='smcTable%i.txt' % iterNO, subDir='iter%i' % iterNO)

Oedometric compression of granular materials in a periodic box

Another example of the stand-alone mode is the calibration of oedometric compression. Pre-run simulation data are stored in the subdirectories of "Oedometric" and the trained proposal densities saved in the pickle format. You might want to take a look at this tutorial and get a feeling of iterative Bayesian calibration in action. The sample size 100 is sufficient for four unknown parameters, which could be a bit conservative. In fact, it is difficult to determine an appropriate sample size. The optimal sample size is problem-dependent and limited by the resources you have at disposal. We are eager to know your experience gained for your particular problems.

Interpretation of the plots

Advanced

Installation

Follow the steps below to install GrainLearning on a Mac or Linux OS:

  • Git and Python
  • brew install git python (Mac)
  • or sudo apt install git python (Linux)
  • Python packages
  • sudo pip install numpy scipy matplotlib ghalton scikit-learn}
  • Clone the git repository
  • Optionally, we recommend you to use the Python IDE (e.g. PyCharm)
  • After installing PyCharm, open PyCharm, select Open Project, select the grainlearning directory. Then go to Preferences, search for Project Interpreter, select the Interpreter textbox, click Open All, click the + symbol to add a new interpreter, select your python executable, and importantly click Inherit global site packages (so you have packages like numpy available).

Dependencies

Referencing GrainLearning

If you are using this software in a work that will be published, please cite this paper:

H. Cheng, T. Shuku, K. Thoeni, Y. Yamamoto. Probabilistic calibration of discrete element simulations using the sequential quasi-Monte Carlo filter. Granular Matter 20, 11 (2018). 10.1007/s10035-017-0781-y

H. Cheng, T. Shuku, K. Thoeni, P. Tempone, S. Luding, V. Magnanimo. An iterative Bayesian filtering framework for fast and automated calibration of DEM models. Comput. Methods Appl. Mech. Eng., 350 (2019), pp. 268-294, 10.1016/j.cma.2019.01.027

Help and Support

For assistance with the GrainLearning software or Bayesian calibration for geomechanical models in general, please raise an issue on the Github Issues page or drop me an email at h.cheng@utwente.nl.