Insufficient Gibbs Sampling

The InsufficientGibbs package offers functionalities for sampling from posterior parameters when only robust and insufficient statistics of the data are available. A comprehensive explanation of the underlying theory can be found in the associated paper on the arXiv. Additionally, the code responsible for generating all figures presented in the paper is located in the figures folder.

Abstract • Install • Tutorial • Examples • Add new distributions/models

Abstract

In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric framework and propose a method to sample from the posterior distribution of parameters conditioned on various robust and inefficient statistics: specifically, the pairs (median, MAD) or (median, IQR), or a collection of quantiles. Our approach leverages a Gibbs sampler and simulates latent augmented data, which facilitates simulation from the posterior distribution of parameters belonging to specific families of distributions. A by-product of these samples from the joint posterior distribution of parameters and data given the observed statistics is that we can estimate Bayes factors based on observed statistics via bridge sampling. We validate and outline the limitations of the proposed methods through toy examples and an application to real-world income data.

`InsufficientGibbs` package

Install

Install latest release via pip

pip install InsufficientGibbs

For latest development version clone the repository and install via pip

git clone https://github.com/antoineluciano/Insufficient-Gibbs-Sampling
cd src/InsufficientGibbs
pip install .

Available distributions

Normal (Normal)
Cauchy (Cauchy)
Laplace(Laplace)
Gamma (Gamma) and its translated version (TranslatedGamma)
LogNormal (LogNormal) and its translated version (TranslatedLogNormal)
Weibull (Weibull) and its translated version (TranslatedWeibull)
Generalized Pareto (GeneralizedPareto)

Add Model to the end of their function names to use them as models.

Tutorial

Create prior distributions

For each parameter of your model, create a variable of type Distribution representing its associated prior distribution. You must specify its hyperparameters and optionally its name.

Create the model

Create a variable of type Model with your selected distribution and its predefined parameters as arguments.

Sample from the posterior given insufficient statistics

To sample from the posterior of the model parameters given the observed statistics, you can use the three functions of the class Model:

Gibbs_Quantile: the observed data consists of a sequence of quantiles.
Gibbs_med_IQR: the observed data consists of the median and the interquartile range (IQR).
Gibbs_med_MAD: the observed data consists of the median and the median absolute deviation (MAD).

Examples

Cauchy distribution with Normal and Gamma priors

from InsufficientGibbs.Distribution import Normal, InverseGamma
from InsufficientGibbs.Models import CauchyModel

# Creating the prior distributions variables
mu = Normal(0,10, name= "x_0")
sigma = Gamma(2,2, name = "gamma")

# Creating the model variable
model = CauchyModel(mu,sigma)

T, N = 10000, 100

# Quantile case 
q1, med, q3 = -1, 0, 1
probs = [.25, .5, .75]
Cauchy_Q = model.Gibbs_Quantiles(T, N, [q1, med, q3], probs)

med, IQR = 0, 2
Cauchy_med_IQR = model.Gibbs_med_IQR(T, N, med, IQR) 

# Median, MAD case
med, MAD = 0, 1
Cauchy_med_MAD = model.Gibbs_med_MAD(T, N, med, MAD)

You can display the chain by using the display_chains function.

How to add new distributions and models

As outlined in our paper, our method is applicable to all continuous distributions with compact support, as it only requires simulation according to a truncated version. Therefore, we provide guidance on how users can seamlessly integrate their own models by simply adding instances to the Distribution and Model classes.

For the Distribution class, an instance should include an initialization function __init__defining a variable _distribution containing all relevant distribution information (e.g., pdf, cdf, ppf) and a domain function describing the domain of definition.

Similarly, for the Model class, an instance should include an initialization function __init__, along with functions returning initialization parameters for the three studied scenarios (Init_theta_Quantile, Init_theta_med_MAD, and Init_theta_med_IQR).

To illustrate, we present examples of distributions. First, we showcase a distribution already implemented in SciPy (the Pareto distribution), followed by a custom implementation (the Pareto Type II distribution).

Example of the Pareto Distribution

Add the the instance Pareto of the class Distribution in the file Distribution.py:

from scipy.stats import pareto

class Pareto(Distribution):
    """
    Container for Pareto Distribution

    Parameters
    ----------
    scale : float
        scale of the pareto
    shape : float
        shape of the pareto
    name : str
    """
    def __init__(self, 
            scale: float=1,
            shape: float=1,
            name: str="") -> None:
        self.scale = PositiveContinuousVariable(scale, name='scale')
        self.shape = PositiveContinuousVariable(shape, name='shape')
        self.name = name
        self._distribution = pareto(b=self.shape.value, scale=self.scale.value)
        parameters_dict = {'scale':self.scale, 'shape':self.shape}
        super().__init__(parameters_dict, self.name)

    def domain(self) -> Tuple[float, float]:
        return (self.parameters_value["scale"], float('inf'))

Add the the instance ParetoModel of the class Model in the file Models.py:

class ParetoModel(Model):
    def __init__(self, scale:Distribution, shape:Distribution) -> None:
        if scale.name == "": scale.name = "scale"
        if shape.name == "": shape.name = "shape"
        
        if scale.name == shape.name:
            raise ValueError("parameters must have different names.")
        
        self.scale = scale
        self.shape = shape
        self.type_distribution = Pareto
        self.parameters_dict = {scale.name: scale, shape.name: shape}
        super().__init__(self.parameters_dict)
        self.distrib_name = "pareto"
        self.init_method = "naive"
        
    def domain(self) -> Tuple[float, float]:
        return (0, float('inf'))
    
    def Init_theta_Quantile(self, Q, P):
        scale = Q[0]-1
        shape = (Q[-1] - Q[0]) / (Pareto(scale=scale)._distribution.ppf(P[-1]) - Pareto(scale=scale)._distribution.ppf(P[0]))
        return {self.scale.name: scale, self.shape.name: shape}
    
    def Init_theta_med_MAD(self, med, MAD):
        scale = 1
        shape = 1.5
        return {self.scale.name: scale, self.shape.name: shape}
    
    def Init_theta_med_IQR(self, med, IQR):
        scale = 1
        shape = 1.5
        return {self.scale.name: scale, self.shape.name: shape}

Example of the Pareto Type II distribution

First, define a class pareto2 that plays the same role as the functions in SciPy.

class pareto2:
    def __init__(self, loc=0, scale=1,shape=1):
        self.loc = loc
        self.scale = scale
        self.shape = shape
        
    def pdf(self,x):
        x=np.array(x)
        return np.where(x>=self.loc,(self.shape/self.scale)*(1+(x-self.loc)/self.scale)**(-self.shape-1),0)
    
    def cdf(self,x):
        x=np.array(x)
        return np.where(x>=self.loc,1-(1+(x-self.loc)/self.scale)**(-self.shape),0)
    def ppf(self,x):
        x=np.array(x)
        return self.loc+self.scale*((1-x)**(-1/self.shape)-1)
    def logpdf(self,x):
        x=np.array(x)
        return np.where(x>=self.loc,np.log(self.shape)-np.log(self.scale)-(self.shape+1)*np.log(1+(x-self.loc)/self.scale),-np.inf)
    def rvs(self,size):
        return self.ppf(np.random.random(size))

Then, add the instance ParetoType2 of the class Distribution using the above class pareto2 in the file Distribution.py:

class ParetoType2(Distribution):
    
    def __init__(self,
            loc: float=0,
            scale: float=1,
            shape: float=1,
            name: str="",
            theta: list = []) -> None:
        if theta!=[]:
            loc,scale,shape = theta
        self.loc = ContinuousVariable(loc, name='loc')
        self.scale = PositiveContinuousVariable(scale, name='scale')
        self.shape = PositiveContinuousVariable(shape, name='shape')
        self.name = name
        self._distribution = pareto2(loc=self.loc.value, scale=self.scale.value,shape=self.shape.value)
        parameters_dict = {'loc': self.loc, 'scale':self.scale, 'shape':self.shape}
        super().__init__(parameters_dict, self.name)
        
        
    def domain(self) -> Tuple[float, float]:
        return (self.parameters_value["loc"], float('inf'))

Finally, add the instance ParetoType2Model of the class Model in the file Models.py:

class ParetoType2Model(Model):
    def __init__(self, loc:Distribution, scale:Distribution, shape:Distribution) -> None:
        if loc.name == "": loc.name = "loc"
        if scale.name == "": scale.name = "scale"
        if shape.name == "": shape.name = "shape"
        
        if loc.name == scale.name or loc.name == shape.name or scale.name == shape.name:
            raise ValueError("parameters must have different names.")
        
        self.loc = loc
        self.scale = scale
        self.shape = shape
        self.type_distribution = ParetoType2
        self.parameters_dict = {loc.name: loc, scale.name: scale, shape.name: shape}
        super().__init__(self.parameters_dict)
        self.distrib_name = "pareto_type2"
        self.init_method = "naive"
        
    def domain(self) -> Tuple[float, float]:
        return (self.parameters_value["loc"], float('inf'))
    
    def Init_theta_Quantile(self, Q, P):
        loc = 2*Q[0]-Q[1]
        shape = 2.5
        scale = (Q[-1] - Q[0]) / (ParetoType2(loc=loc, shape=shape)._distribution.ppf(P[-1]) - ParetoType2(loc=loc, shape=shape)._distribution.ppf(P[0]))
        return {self.loc.name: loc, self.scale.name: scale, self.shape.name: shape}
    
    def Init_theta_med_MAD(self, med, MAD):
        loc = med-2*MAD
        scale = 1
        shape = 1.5
        return {self.loc.name: loc, self.scale.name: scale, self.shape.name: shape}
    
    def Init_theta_med_IQR(self, med, IQR):
        loc = med-IQR
        scale = 1
        shape = 1.5
        return {self.loc.name: loc, self.scale.name: scale, self.shape.name: shape}

InsufficientGibbs
Release 1.0.5

Release 1.0.5

1.2.1

1.2.0

1.1.10

1.1.9

1.1.8

1.1.7

1.1.6

1.1.5

1.1.4

1.0.9

Documentation

Insufficient Gibbs Sampling

Table of contents

Abstract

`InsufficientGibbs` package

Install

Available distributions

Tutorial

Create prior distributions

Create the model

Sample from the posterior given insufficient statistics

Examples

Cauchy distribution with Normal and Gamma priors

How to add new distributions and models

Example of the Pareto Distribution

Example of the Pareto Type II distribution

Stats

Development practices

Releases

Contributors

InsufficientGibbs Release 1.0.5

Release 1.0.5 Toggle Dropdown 1.2.1 1.2.0 1.1.10 1.1.9 1.1.8 1.1.7 1.1.6 1.1.5 1.1.4 1.0.9

Documentation

Insufficient Gibbs Sampling

Table of contents

Abstract

InsufficientGibbs package

Install

Available distributions

Tutorial

Create prior distributions

Create the model

Sample from the posterior given insufficient statistics

Examples

Cauchy distribution with Normal and Gamma priors

How to add new distributions and models

Example of the Pareto Distribution

Example of the Pareto Type II distribution

Stats

Development practices

Releases

Contributors

InsufficientGibbs
Release 1.0.5

Release 1.0.5

1.2.1

1.2.0

1.1.10

1.1.9

1.1.8

1.1.7

1.1.6

1.1.5

1.1.4

1.0.9

`InsufficientGibbs` package