pathway-abstract-classifier

A tool to classify articles containing biological pathway information


License
MIT
Install
pip install pathway-abstract-classifier==0.2.0

Documentation

Open In Colab Open in Streamlit build License codecov

Pathway Abstract Classifier

A tool to classify articles with biological pathway information.

Requirements

This project requires Python >=3.8.

Installation

Set up a virtual environment. Here, we use miniconda to create an environment named testenv:

$ conda create --name testenv python=3.8
$ conda activate testenv
pip install pathway-abstract-classifier

Usage

Demo

Run a simple demo using streamlit

As this project was built with poetry, you'll need to install poetry to get this project's development dependencies.

From within the directory housing the GitHub repository:

$ poetry install

Now run the app:

$ streamlit run ./pathway_abstract_classifier/app.py

Example

Classify one article with biological pathway information and one that clearly does not.

import ktrain
from cached_path import cached_path

# Point this to newest release to get newest model
model_path = cached_path("https://github.com/PathwayCommons/pathway-abstract-classifier/releases/download/pretrained-models/title_abstract_model.zip", extract_archive=True)

# Note that the following follows basic Ktrain (https://github.com/amaiya/ktrain) syntax.

# Load model
model = ktrain.load_predictor(model_path)

# Example articles
titles = [
    "YTHDC1-mediated augmentation of miR-30d in repressing pancreatic tumorigenesis via attenuation of RUNX1-induced transcriptional activation of Warburg effect",
    "Loss of 15-lipoxygenase disrupts T reg differentiation altering their pro-resolving functions"
]

abstracts = [
    "Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal human cancers. It thrives in a malnourished environment; however, little is known about the mechanisms by which PDAC cells actively promote aerobic glycolysis to maintain their metabolic needs. Gene Expression Omnibus (GEO) was used to identify differentially expressed miRNAs. The expression pattern of miR-30d in normal and PDAC tissues was studied by in situ hybridization. The role of miR-30d/RUNX1 in vitro and in vivo was evaluated by CCK8 assay and clonogenic formation as well as transwell experiment, subcutaneous xenograft model and liver metastasis model, respectively. Glucose uptake, ATP and lactate production were tested to study the regulatory effect of miR-30d/RUNX1 on aerobic glycolysis in PDAC cells. Quantitative real-time PCR, western blot, Chip assay, promoter luciferase activity, RIP, MeRIP, and RNA stability assay were used to explore the molecular mechanism of YTHDC1/miR-30d/RUNX1 in PDAC. Here, we discover that miR-30d expression was remarkably decreased in PDAC tissues and associated with good prognosis, contributed to the suppression of tumor growth and metastasis, and attenuation of Warburg effect. Mechanistically, the m6A reader YTHDC1 facilitated the biogenesis of mature miR-30d via m6A-mediated regulation of mRNA stability. Then, miR-30d inhibited aerobic glycolysis through regulating SLC2A1 and HK1 expression by directly targeting the transcription factor RUNX1, which bound to the promoters of the SLC2A1 and HK1 genes. Moreover, miR-30d was clinically inversely correlated with RUNX1, SLC2A1 and HK1, which function as adverse prognosis factors for overall survival in PDAC tissues. Overall, we demonstrated that miR-30d is a functional and clinical tumor-suppressive gene in PDAC. Our findings further uncover that miR-30d is a novel target for YTHDC1 through m6A modification, and miR-30d represses pancreatic tumorigenesis via suppressing aerobic glycolysis.",
    "Regulatory T-cells (Tregs) are central in the maintenance of homeostasis and resolution of inflammation. However, the mechanisms that govern their differentiation and function are not completely understood. Herein, we demonstrate a central role for the lipid mediator biosynthetic enzyme 15-lipoxygenase (ALOX15) in regulating key aspects of Treg biology. Pharmacological inhibition or genetic deletion of ALOX15 in Tregs decreased FOXP3 expression, altered Treg transcriptional profile and shifted their metabolism. This was linked with an impaired ability of Alox15-deficient cells to exert their pro-resolving actions, including a decrease in their ability to upregulate macrophage efferocytosis and a downregulation of interferon gamma expression in Th1 cells. Incubation of Tregs with the ALOX15-derived specilized pro-resolving mediators (SPM)s Resolvin (Rv)D3 and RvD5n-3 DPA rescued FOXP3 expression in cells where ALOX15 activity was inhibited. In vivo, deletion of Alox15 led to increased vascular lipid load and expansion of Th1 cells in mice fed western diet, a phenomenon that was reversed when Alox15-deficient mice were reconstituted with wild type Tregs. Taken together these findings demonstrate a central role of pro-resolving lipid mediators in governing the differentiation of naive T-cells to Tregs."
]

# Concatenate titles and abstracts with [SEP] token expected by BERT based models
sep_token = model.preproc.get_tokenizer().sep_token
texts = [" ".join([title, sep_token, abstract]) for title, abstract in zip(titles, abstracts)]

# Make predictions. Ktrain may throw a UserWarning which you can safely ignore.
predictions = model.predict(texts)

# Verify Articles Classified Correctly
assert predictions == [1,0]

Testing

From within the directory housing the GitHub repository:

$ poetry install

Run the test script:

$ ./test.sh

Under the hood, the tests are run with pytest. The test script also does a lint check with flake8.

Publishing a release

A GitHub workflow will automatically version and release this package to PyPI following a push directly to main or when a pull request is merged into main. A push/merge to main will automatically bump up the patch version.

We use Python Semantic Release (PSR) to manage versioning. By making a commit with a well-defined message structure, PSR will scan commit messages and bump the version accordingly in accordance with semver.

For a patch bump:

$ git commit -m "fix(ncbiutils): some comment for this patch version"

For a minor bump:

$ git commit -m "feat(ncbiutils): some comment for this minor version bump"

For a release:

$ git commit -m "feat(mod_plotting): some comment for this release\n\nBREAKING CHANGE: other footer text."

Resources

See the tutorial (or open it in Colab) for a more detailed guide on potential usage. Importantly, this tutorial shows how to conduct threshold-moving, which you can learn more about here. Also consider taking a look at the Ktrain documentation and repo which contains some very good tutorials.