profile_binr

The PROFILE methodology for the binarisation and normalisation of RNA-seq data.

This is a Python interface to a set of normalisation and binarisation functions for RNA-seq data originally written in R.

This software package is based on the methodology developed by Beal, Jonas; Montagud, Arnau; Traynard, Pauline; Barillot, Emmanuel; and Calzone, Laurence at Computational Systems Biology of Cancer team at Institut Curie (contact-sysbio@curie.fr). It generalizes and offers a Python interface of the original implementation in Rmarkdown notebooks available at https://github.com/sysbio-curie/PROFILE.

Installation

Using conda

The tool can be installed using the Conda package profile_binr in the colomoto channel. Note that some of its dependencies requires the conda-forge channel.

conda install -c conda-forge colomoto::profile_binr

Using pip

Requirements

R (≥4.0)
R packages:
- mclust
- diptest
- moments
- magrittr
- tidyr
- dplyr
- tibble
- bigmemory
- doSNOW
- foreach
- glue

pip install profile_binr

Usage

Once again this is a minimal example :

from profile_binr import ProfileBin
import pandas as pd

# your data is assumed to contain observations as
# rows and genes as columns
data = pd.read_csv("path/to/your/data.csv")
data.head()

	Clec1b	Kdm3a	Coro2b	8430408G22Rik	Clec9a	Phf6	Usp14	Tmem167b
cell_id
HSPC_025	0.0	4.891604	1.426148	0.0	0.0	2.599758	2.954035	6.357369
HSPC_031	0.0	6.877725	0.000000	0.0	0.0	2.423483	1.804914	0.000000
HSPC_037	0.0	0.000000	6.913384	0.0	0.0	2.051659	8.265465	0.000000
LT-HSC_001	0.0	0.000000	8.178374	0.0	0.0	6.419817	3.453502	2.579528
HSPC_001	0.0	0.000000	9.475577	0.0	0.0	7.733370	1.478900	0.000000

# create the binarisation instance using the dataframe
# with the index containing the cell identifier
# and the columns being the gene names
probin = ProfileBin(data)

# compute the criteria used to binarise/normalise the data :
# This method uses a parallel implementation, you can specify the 
# number of workers with an integer
probin.fit(8) # train using 8 threads

# Look at the computed criteria
probin.criteria.head(8)

	Dip	BI	Kurtosis	DropOutRate	MeanNZ	DenPeak	Amplitude	Category
Clec1b	0.358107	1.635698	54.017736	0.876208	1.520978	-0.007249	8.852181	ZeroInf
Kdm3a	0.000000	2.407548	-0.784019	0.326087	3.847940	0.209239	10.126676	Bimodal
Coro2b	0.000000	2.320060	7.061604	0.658213	2.383819	0.004597	9.475577	ZeroInf
8430408G22Rik	0.684454	3.121069	21.729044	0.884058	2.983472	0.005663	9.067857	ZeroInf
Clec9a	1.000000	2.081717	140.089285	0.965580	2.280293	-0.009361	9.614233	Discarded
Phf6	0.000000	1.988667	-1.389024	0.035628	5.025501	2.017547	10.135226	Bimodal
Usp14	0.000000	2.208080	-1.224987	0.007850	6.109964	8.245570	11.088750	Bimodal
Tmem167b	0.000000	2.430813	0.093023	0.393720	3.448331	0.072982	9.486826	Bimodal

# get binarised data (alternatively .binarise()):
my_bin = probin.binarize()
my_bin.head()

	Clec1b	Kdm3a	Coro2b	8430408G22Rik	Clec9a	Phf6	Usp14	Tmem167b
HSPC_025	NaN	1.0	NaN	NaN	NaN	0.0	0.0	1.0
HSPC_031	NaN	1.0	NaN	NaN	NaN	0.0	0.0	0.0
HSPC_037	NaN	0.0	1.0	NaN	NaN	0.0	1.0	0.0
LT-HSC_001	NaN	0.0	1.0	NaN	NaN	1.0	0.0	0.0
HSPC_001	NaN	0.0	1.0	NaN	NaN	1.0	0.0	0.0

# idem for normalised data :
my_norm = probin.normalize()
my_norm.head()

	Kdm3a	Coro2b	Clec9a	Phf6	Usp14	Tmem167b
HSPC_025	9.786196e-01	0.184102	NaN	0.000801	8.318176e-05	9.999970e-01
HSPC_031	9.999981e-01	0.000000	NaN	0.000462	8.084114e-07	6.874397e-11
HSPC_037	4.408417e-09	0.892449	NaN	0.000145	9.999940e-01	6.874397e-11
LT-HSC_001	4.408417e-09	1.000000	NaN	0.991865	6.230178e-04	1.599753e-04
HSPC_001	4.408417e-09	1.000000	NaN	0.999865	2.171153e-07	6.874397e-11

References

Béal J, Montagud A, Traynard P, Barillot E and Calzone L (2019) Personalization of Logical Models With Multi-Omics Data Allows Clinical Stratification of Patients. Front. Physiol. 9:1965. doi:10.3389/fphys.2018.01965

profile-binr
Release 0.1.1

Release 0.1.1

0.1.2

0.1.1

0.1.0

Documentation

profile_binr

Installation

Using conda

Using pip

Requirements

Usage

References

Stats

Development practices

Releases

Contributors

profile-binr Release 0.1.1

Release 0.1.1 Toggle Dropdown 0.1.2 0.1.1 0.1.0

Documentation

profile_binr

Installation

Using conda

Using pip

Requirements

Usage

References

Stats

Development practices

Releases

Contributors

profile-binr
Release 0.1.1

Release 0.1.1

0.1.2

0.1.1

0.1.0