Package for doing deep supervised learning on ATLAS data.
pip install deepcalo==0.2.3
Author: Frederik Faye, The Niels Bohr Institute, 2019
This package allows you to build, train and tune convolutional neural network (CNN) models using Keras with any backend.
You can also integrate models for processing non-image data, such as scalars and sequences. The models that can be built have been designed specifically with the ATLAS detector in mind, but you can also just use the framework and all its nice features for any Keras-based project.
pip install deepcalo
numpy, pandas, matplotlib, h5py, joblib, tqdm, keras, tensorflow, scikit-optimize, keras-contrib, keras-drop-block
If you want to be able to plot the graph of your model, please install pydot
and graphviz
(if possible, use conda install python-graphviz
for graphviz
).
The main functionality lies in the so-called model container, which can be imported as
from deepcalo import ModelContainer
See the documentation below for all the details. However, often an example is a better way of learning. Some examples are found in the demos folder.
Download and run the MNIST tutorial:
python mnist_tutorial.py --exp_dir ./my_mnist_experiment/ -v 1
This will train a tiny CNN for a single epoch to discriminate between the digits of the MNIST dataset, which should reach $>95\%
$ test accuracy after its first epoch.
Open the script to see what is going on; the important part is the hyperparameter section. Try playing around with the parameters to see if you can find a network that does better! Also have a look at the contents of the logs folder in the experiment directory (./my_mnist_experiment/
) to see some of the nice logging features this framework has to offer.
There are a lot more hyperparameters to play around with. See the documentation for what is possible.
In the demos folder, you will also find the more realistic examples of run_model.py
and hp_search.py
. The former shows how to run a single model (much like the MNIST example), while the latter shows how to use this package for doing hyperparameter optimization (using the Bayesian optimization of scikit-optimize).
Both use actual ATLAS simulation data to do energy regression. The data used herein can be downloaded from the lxplus (CERNBox) directory /eos/user/l/lehrke/Data (which should be visible to all current CERN members).
The scripts uses the function deepcalo.utils.load_data
function, which is tailored to these datasets. If need be, you can modify this function to work with your data, however note that this framework uses the 'channels_last'
format, which is the standard in Keras.
The following is a quick tour of the different out-of-the-box models available. Each model is made for a different kind of input, e.g., images, scalar variables, tracks, or the output from other models.
All models except the top are optional to use. However, models are tied to their input in such a way that if for instance a tracks
dataset is present in the suppplied data, the track net will be integrated into the combined model.
You can find information about how to set the hyperparameters of these models in the documentation, where each model has its own section.
Below, an illustration of the default CNN architecture can be seen. It is comprised of blocks. For all but the first block, a block begins with downsampling and the number of feature maps being doubled.
The tuple underneath the input denotes the size of the input (here height, width, channels).
Note that normalization, the activation function, downsampling and global average pooling can all be turned on or off.
The output of the CNN is passed on to the top.
The top model is a simple, dense neural network that takes as input the concatenated outputs of other models, and gives a final output, which can be any 1D size $\geq 1
$.
The scalar net is again a simple, dense network that processes any scalar variables you may want to include. Its output can be connected to either the top, the FiLM generator, or both.
The FiLM generator is a nice way of conditioning the CNN with scalar variables. You can read a good introduction to the technique here.
The FiLM generator can take inputs from both the scalar net and the track net. Its output modulates the CNN.
NOTE: This architecture has not yet been tested.
This model takes the (varying) number of track vectors for a datapoint as input and spits out a fixed size representation of that datapoint, which is then passed on to the top, the FiLM generator, or both.
As the order in which we give our model the track vectors for a datapoint carries no information, the permutation invariant method of Deep Sets has been used.
The $T
$ in the shape of $X_{\mathrm{track}}
$ is the largest number of track vectors of any datapoint in the dataset, where zero-padding has been used if the actual number of tracks for a given datapoint is smaller than $T
$.
Note that right now, the aggregation part of the track net is a simple sum, as in the Deep Sets paper.
The CNN architecture can be expanded to include information about when in time each cell determined its signal to be, in order to help mitigate out-of-time pileup.
The time for each cell in each channel (typically corresponding to a layer in the calorimeter) is collected in an image tensor $X_{\mathrm{time-img}}
$ of the same resolution and dimension as the standard cell image tensor $X_{\mathrm{img}}
$.
$X_{\mathrm{time-img}}
$ is first passed through a gating mechanism (the time net), which outputs a real number between zero and one for each pixel in each channel. These numbers are then merged with $X_{\mathrm{img}}
$, either by element-wise multiplication and then concatenation along the channel axis, or just by element-wise multiplication. The idea is that the element-wise multiplication allows the network to tone down the values of out-of-time cells.
The resulting, merged tensor is then given as the input to the CNN (in the stead of $X_{\mathrm{img}}
$).
It is recommended to use pgauss_f
as the final activation in the time net.
In the final illustration below, you can see how the models all fit together.
See https://indico.cern.ch/event/800614/contributions/3327152/attachments/1799540/2936007/presentation_cnn.pdf for some slightly outdated results on energy regression using the above described networks with good hyperparameters.
The heart of DeepCalo is the ModelContainer
class, found in deepcalo.model_container
, which is documented below.
class ModelContainer:
"""
A class for organizing the creation, training and evaluation of models.
"""
def __init__(self, data, params, dirs, save_figs=True, verbose=True):
Dictionary of training, validation and (optionally) test data, organized according to the type of data.
This dictionary must have either two or three keys, being train
and val
, or train
, val
, and test
. Each of these keys points to another dict containing different kinds of data. The keys of these dictionaries can be any or all of 'images'
, 'scalars'
, 'tracks'
, 'sample_weights'
and 'targets'
. The documentation for what each of these keys refers to is given below. Note that 'images'
, 'scalars'
and 'tracks'
are considered inputs, and at least one of them must be non-empty.
Note that the shapes of the datasets contained in data
are used in the model creation (but a single datapoint is enough to do so).
images : dict of ndarrays
'images'
is a key in data[set_name]
and data[set_name]['images']
is non-empty (where set_name
can be either 'train'
, 'val'
or 'test'
), a CNN will be created and used to process these images.To allow you to keep track of different types of images (intended to be used for different kinds of things), the value corresponding to the 'images'
key of data
is also a dict.
Say you have two different kinds of images that you would like to have processed by the CNN. Let's call them 'low_res_imgs'
and 'high_res_imgs'
(i.e., these are the keys in the data[set_name]['images']
dictionary), each being 4D numpy array with shape $(N,H,W,C)
$, where $H
$ and $W
$ are different for the two types of images. You can then use the upsampling
functionality (see params
) to upsample them to a common resolution, so that they can be processed together in the CNN.
If you want to use time images, these should be names the exact same as the cells they correspond to, with 'time_'
preprended; using the example from above, the data[set_name]['images']
dictionary would then have the four keys 'low_res_imgs'
, 'time_low_res_imgs'
, 'high_res_imgs'
and 'time_high_res_imgs'
. Only if images are named in this manner will a submodel for processing the time images be created and used.
scalars : ndarray
'scalars'
is a key in data[set_name]
and data[set_name]['scalars']
is non-empty (where set_name
can be either 'train'
, 'val'
or 'test'
), a scalar net will be created and used to process these scalars.The scalars should come in the form of a $(N,S)
$ numpy array, where $N
$ is the number of datapoints in the set, and where $S
$ is the number of scalars.
tracks : ndarray
'tracks'
is a key in data[set_name]
and data[set_name]['tracks']
is non-empty (where set_name
can be either 'train'
, 'val'
or 'test'
), a track net will be created and used to process these tracks.The track vectors should come in the form of an ndarray of ndarrays, where the inner ndarrays should have size $(T,F)
$, where $T
$ is the number of tracks for that particular datapoint, and where $F
$ is the length of each track vector.
Note that the track net is still work-in-progress.
sample_weights : ndarray
'sample_weights'
is a key in data[set_name]
and data[set_name]['sample_weights']
is non-empty (where set_name
can be either 'train'
, 'val'
or 'test'
), then these sample weights will be used in all loss functions and metrics (both during training and evaluation).targets : ndarray
Example of valid data
:
import numpy as np
set_names = ['train', 'val', 'test'] # Could also just be ['train', 'val']
# Number of datapoints for each set
n_points = {set_name:int(1e3) for set_name in set_names}
# Dimension of images, which we will call 'example_imgs'
h,w,c = 14,25,2
# Number of scalars
n_scalars = 7
# Create the data
data = {set_name:{'images':{'example_imgs':np.random.randn(n_points[set_name],h,w,c)},
'scalars':np.random.randn(n_points[set_name],n_scalars),
'tracks':{}, # Is empty, so track_net won't be created and used
'targets':np.random.randn(n_points[set_name]) # Here, the target is a single number per datapoint
} for set_name in set_names}
This dictionary contains all the hyperparameters used in constructing, training and evaluating the model. Default parameters can be gotten from the function deepcalo.utils.get_default_params
.
When a dictionary key is referenced below, what is actually meant is the
value corresponding to that key. For instance, although the key 'epochs'
is of course a str, the documentation below concerns itself with the value
of this key, which in this case is an int.
epochs : int
use_earlystopping
is set to
True
, training may stop prior to completing the chosen number of
epochs.batch_size : int
loss : str
get_loss_function
in
model_building_functions.py
for implemented custom loss functions,
as well as how to implement your own.metrics : list of strs or None
None
, the loss
will be
the evaluation function used by the Gaussian process hyperparameter search.optimizer : str or config dict
Which optimizer to use. Any Keras optimizer, as well as the Padam and Yogi optimizers from keras_contrib, can be used.
If you don't want to simply use the defaults parameters of the chosen optimizer, instead give a config dict. See Explanation of str or config dict.
lr_finder : dict
'use'
key is a bool deciding whether or not to use the
learning rate finder as implemented in custom_classes.py
.
The 'scan_range'
is a list containing the minimum and maximum
learning rate to be scanned over.
The 'epochs'
key is an int setting the number of epochs to use in the scan.
1-4 epochs is typically enough, depending on the size of the training set.
If 'prompt_for_input'
is True
, the user will be asked to input a
range within which the cyclical learning rate schedule (see below)
should vary in between upon completing the learning rate finder
scan.lr_schedule : dict
'name'
key can be either None
(when no learning rate schedule
will be used), 'CLR'
(when the triangular schedule from Smith, 2015 will be used) or 'SGDR'
(when the SGD with warm restarts, by
Loshchilov and Hutter, 2017 will be used.)
]Besides a 'name', keyword arguments can be passed to the callback
chosen, e.g. {'name':'CLR', **kwargs}
, where
kwargs = {'range':[1e-3,5e-3], 'step_size_factor':5}
.auto_lr : bool
get_auto_lr
in
model_building_functions.py
that automatically sets a good learning
rate based on the chosen optimizer and the batch size, taking the
learning rate to be propertional to the square root of the batch
size. The constant of proportionality varies from optimizer to
optimizer, and probably from problem to problem - use the learning
rate finder to find out which constant is suitable for your problem.use_earlystopping : bool
min_delta=0.001
and
patience=150
(these can be changed in model_container.py
).restore_best_weights : bool
pretrained_model : dict
'use'
key is a boolean deciding whether or not to load
pretrained weights. The 'weights_path'
is the path to the weights
of the pretrained network. If 'params_path'
is None
, the
parameters for the pretrained network is assumed to be in the parent
folder of the 'weights_path'
.
'layers_to_load'
is a list with the Keras names of the layers (or
submodels) whose weights should be transferred from the pretrained
model to the one at hand. These names must refer to the same
structure in both the pretrained model and in the model at hand.
'freeze_loaded_layers'
can be either a boolean (when, if True
,
all layers listed in 'layers_to_load'
will be frozen, or not, if
False
) or a list of bools with the same length as
'layers_to_load'
(when the first boolean in
'freeze_loaded_layers'
answers whether to freeze the first layer
given by 'layers_to_load'
or not, etc.).n_gpus : int
data_generator : dict
Parameters concerning a DataGenerator, which is helpful if your data does not fit in memory, as it loads data in batches. Its current implementation (which you will most likely need to tailor to your pipeline) can be seen in deepcalo.data_generator
.
The 'use'
key is a bool deciding whether to use a DataGenerator.
The 'n_workers'
key is an int that sets the number of CPU workers to use for preparing batches.
The 'max_queue_size'
key is an int that sets the upper limit to how many batches can be ready at any one time.
The 'path'
key is a str that gives the path to the dataset.
The 'n_points'
key is a dict with the keys 'train'
and 'val'
(or 'train'
, 'val'
and 'test'
), whose corresponding values is an int given the number of datapoints in each set.
An example of a valid data_generator
dictionary:
{'use':True,
'n_workers':4,
'max_queue_size':10,
'path':'./my_data.h5',
'n_points':{set_name:int(1e3) for set_name in ['train', 'val', 'test']}}
usampling : dict
Dictionary of parameters for upsampling input images inside the
network. This can be useful if the ability to downsample (which
introduces translational invariance) is important but the input
images are small.
The 'use'
key is a boolean deciding whether to upsample or not.
'wanted_size'
refers to the size that all images should be
upsampled to before being concatenated. The 'interpolation'
argument is passed to the Keras layer UpSample2D
.
After upsampling, the cell image tensor is normalized so as to maintain the same amount of energy overall, but now spread out over the upsampling pixels.
initialization : str or config dict
Initialization of the parameters of the submodel. Can be any initializer recognized by Keras.
If you don't want to simply use the defaults parameters of the chosen initializer, instead give a config dict. See Explanation of str or config dict.
normalization : str or config dict or None
Normalization layer. If not None
, the chosen normalization layer is
placed after every dense or convolutional layer in the submodel,
i.e., before an activation function.
Can be any of 'batch'
, 'layer'
, 'instance'
or 'group'
.
Note that the last three are implemented through a group
normalization layer (which encompass the layer and instance
normalization). This means that the name of the normalization
layer when using keras.utils.plot_model
will be the name of a
group normalization layer when using any of the last three.
If you don't want to simply use the defaults parameters of the chosen normalization layer, instead give a config dict. See Explanation of str or config dict.
activation : str or config dict or None
Activation function of all dense or convolutional layers in the
submodel, except for the very last one, if a final_activation
variable is present.
Can be any of 'relu'
, 'leakyrelu'
, 'prelu'
, 'elu'
or
'swish'
.
See get_activation
in model_building_functions.py
for examples
of implementations of custom activation functions.
Is placed right after every normalization layer
in the submodel, or - if normalization
is None
, right after
every dense or convolutional layer in the submodel.
If you don't want to simply use the defaults parameters of the chosen activation, instead give a config dict. See Explanation of str or config dict.
layer_reg : dict with None or strs or config dicts as values
Layer regularization to be applied to all dense or convolutional
layers in the submodel. This dict collects kernel_regularizer
s,
bias_regularizer
s, activity_regularizer
s, kernel_constraint
s
and bias_constraint
s to be applied.
If you don't want to simply use the defaults parameters of the chosen regularizer, instead give a config dict. See Explanation of str or config dict.
An example of what is allowed:
{'kernel_regularizer':'l2',
'bias_regularizer':{'class_name':'l1',
'config':{'l':1e-5}},
'activity_regularizer':None,
'kernel_constraint':{'class_name':'max_norm',
'config':{'max_value':3}},
'bias_constraint':'max_norm'}
Any of these keys can be left out to invoke the default value of
None
. If the dict is empty, no layer regularization will be
applied.
dropout : float or dict or None
None
, no dropout or dropblock layers will be added.top
:Submodel for collecting inputs (e.g. from other submodels) and giving the output of the full model.
See "Keys concerning all submodels" for keys initialization
,
activation
, normalization
, layer_reg
and dropout
.
units : list of ints
units
should be the number of desired outputs.final_activation : str
'sigmoid'
, and use 'linear'
or None
(or 'relu'
to enforce non-negativity) for regression.cnn
:Submodel for processing images. Will only be used if img_names
is not None.
See "Keys concerning all submodels" for keys initialization
,
activation
, normalization
, layer_reg
and dropout
.
cnn_type : str
'simple'
. Set to 'res'
to use residual
blocks, as in He et al., 2016.
Setting cnn_type to some other string is a good way to implement
other types of CNNs, which are then integrated into the framework.
For instance, to use the ResNet18 of keras_contrib, set to 'res18'
- see model_building_functions.py
under get_cnn
for how this is
done.conv_dim : int
2
or 3
. Whether to use 2D or 3D convolutions.block_depths : list of ints
List with number of convolutional layers for each block as elements. See the illustration here for what constitutes a block.
Note that is cnn_type
is 'res'
, two convolutional layers are
used per int, e.g. a block_depth
value of [1,2,2,2,2]
will
result in a CNN with 18 convolutional layers, wheres the CNN would
only have 9 convolutional layers if cnn_type
had been 'simple'
.
n_init_filters : int
init_kernel_size : int or tuple
Kernel size of the first convolutional layer.
If and int is given and conv_dim
is 2
, a kernel size of
(init_kernel_size,init_kernel_size)
is used. If conv_dim
is
instead 3
, a kernel size of (init_kernel_size,init_kernel_size,2)
is used.
If a tuple is given, its length must equal conv_dim
.
rest_kernel_size : int or tuple
Kernel size of all but the first convolutional layer.
If and int is given and conv_dim
is 2
, a kernel size of
(init_kernel_size,init_kernel_size)
is used. If conv_dim
is
instead 3
, a kernel size of (init_kernel_size,init_kernel_size,2)
is used.
If a tuple is given, its length must equal conv_dim
.
cardinality : int
1\times 1
$ convolutions) will be used instead of the normal convolutions when cardinality > 1
. Only supported for 2D convolutions.use_squeeze_and_excite : bool
Whether to use the squeexe and excite block from Hu et al., 2017. For the 'simple'
cnn_type
it will be inserted after the activation function (which comes after the normalization layer). For the 'res'
cnn_type
it will be inserted right before the addition of the skip-connection.
Note that the $r
$ hyperparameter of every squeeze and excite block is set to 16
, meaning that n_init_filters
must at least be 16
as well, if use_squeeze_and_excite
is True
.
globalavgpool : bool
downsampling : str or None
One of None
(no downsampling is used), 'avgpool'
(with
pool_size=2
), 'maxpool'
(with pool_size=2
), or 'strided'
(when strided convolutions with stride and kernel size of 2 is used
to downsample).
When one dimension is more than 1.5 times larger than another
dimension, that (larger) dimension will be downsampling such that it
is reduced by a factor of 3, instead of 2. This can be changed in
get_downsampling
in model_building_functions.py
.
min_size_for_downsampling : int
The minimum that any dimension over which convolutions are made (so excluding samples and channels dimensions) must be if downsampling is to take place. This is to prevent downsampling down to too small images.
E.g., if 2D downsampling is attempted on a (None,7,6,4)
image
tensor while min_size_for_downsampling
is 6
, the downsampling
goes through and the result would be (None,3,3,4)
. If, on the
other hand, min_size_for_downsampling
was 7
, the third dimension
of the image tensor would be too small, and no downsampling would
take place.
scalar_net
:Submodel for processing scalar variables. Will only be used if scalar_names
is not None.
See "Keys concerning all submodels" for keys initialization
,
activation
, normalization
, layer_reg
and dropout
.
units : list of ints
connect_to : list of strs
scalar_net
as
(part of) their input. Can contain either 'top'
and/or
'FiLM_gen'
. It can in principle also be empty, but you should
rather turn off the use of scalar variables by setting
scalar_names
to None
.FiLM_gen
:Submodel for modulating the feature maps of the cnn
submodel, called
a FiLM generator. See this for an overview. Will only be used if the connect_to
list of either of scalar_net
or track_net
contains FiLM_gen
(and those submodels are used).
See "Keys concerning all submodels" for keys initialization
,
activation
, normalization
, layer_reg
and dropout
.
use : bool
units : list of ints
track_net
:Submodel for processing tracks. Will only be used if use_tracks
is True
.
Uses Deep Sets, see Zaheer et al., 2017.
See "Keys concerning all submodels" for keys initialization
,
activation
, normalization
, layer_reg
and dropout
.
phi_units : list of ints
rho_units : list of ints
connect_to : list of strs
track_net
as
(part of) their input. Can contain either 'top'
and/or
'FiLM_gen'
. It can in principle also be empty, but you should
rather turn off the use of tracks by setting use_tracks
to
False
.time_net
:Submodel for processing time images. Will only be used if use_times
is True
.
See "Keys concerning all submodels" for keys initialization
,
activation
, normalization
, layer_reg
and dropout
.
units : list of ints
final_activation
.use_res : bool
units
.final_activation : str or config dict
Activation function to apply to the last dense layer, or to the
input itself, in case units
is empty.
The output of the chosen activation function should be in the range
[0;1].
Custom activations 'gauss'
, ``'gauss_f',
'pgauss'` and
`'pgauss_f'` have been implemented to use here. The "p" stands for
"parametric", while the "f" stands for "flipped".
final_activation_init : list
List of initial weights for parametric activation functions
'pgauss'and
'pgauss_f'. These contain a single parameter, namely the width of the Gaussian, and so
final_activation_initshould contain a single float, e.g.,
final_activation_initcould be
[0.5]`.
If you don't want to simply use the defaults parameters of the chosen activation, instead give a config dict. See Explanation of str or config dict.
Say you want to use the RandomNormal
initializer of Keras to
initialize the weights of some submodel. If you want to use the
default value of the parameters of this class (mean=0.0, stddev=0.05, seed=None
), you can simply give the str 'RandomNormal'
as the
value for the initialization
key described above.
However, if you want to pass some other parameters to the class, you
can instead give a config dict as the value for the
initialization
key. A config dict must have two keys,
'class_name'
and 'config'
. The value corresponding to the
'class_name'
key should be a str, e.g. 'RandomNormal'
, while the
value corresponding to the 'config'
key should be a dict containing
the keyword arguments you wish to pass to the class (an empty 'config'
dict will use the default values).
You could for example give the following as the value corresponding to
the initialization
key:
{'class_name':'RandomNormal', 'config':{'stddev':1.0}}
See the docs for layer_reg
above for additional examples.
For a more technical definition of the config dict: It is what is
returned from keras.utils.serialize_keras_object(keras_object)
where keras_object
is the class instance you wish to create.
In most cases, aliases are set up such that multiple names for the
same class are valid, e.g. if you want to use batch normalization as
normalization in some submodel, you can pass any of 'batch'
,
'BatchNormalization'
, 'batch_norm'
, etc, as the 'class_name'
.
Dictionary of directories to put logs in. Should contain the keys 'log'
(the directory to put all the other directories in), 'fig'
(for saving figures), 'saved_models'
(for saving models and/or weights of the models during training) and 'lr_finder'
(for storing the results of the learning rate finder, if used).
The function deepcalo.utils.create_directories
will return such a dictionary (and create its contained directories).
Whether to save plots of the model and its submodels.
Verbose output. Set to 2
to disable the progress bar for each epoch.
Creates the model, as defined by params
, which it takes as its sole input.
Trains the model constructed by get_model
.
Evaluates the model constructed by get_model
, typically at the end of training using a test set.
multi_gpu_model
cannot be loaded using single GPU, see https://github.com/keras-team/keras/issues/9562