Contrastive learning for enhancing feature extraction in anticancer peptides

A deep learning model designed to screen anticancer peptides (ACPs) using peptide sequences only. A contrastive learning technique was applied to enhance model performance, yielding better results than a model trained solely on binary classification loss. Furthermore, two independent encoders were employed as a replacement for data augmentation, a technique commonly used in contrastive learning.

Dependencies

pytorch>=2.0.1
numpy>=1.25.2
biopython

Datasets

Datasets for model training were obtained from ACPred-LAF. Six benchmark datasets were used for model training:

ACP-Mixed-80
ACP2.0 main
ACP2.0 alternative
ACP500+ACP164
ACP500+ACP2710
LEE+Independent

For more detailed information, refer to this research article.

Model training

Use the following command to start model training:

python train.py --model_info {model_info} --batch_size {batch_size} --dropout_rate {dropout_rate}
                --lr {learning_rate} --epoch {maximum_training_epochs} --dataset {bechmark_dataset}
                --alpha {alpha} --beta {beta} --temp {temperature} --gpu {gpu_number}

model_info: Choose an encoder architecture from the ./model/model_params directory for model training. For example, --model_info ./model/model_params/cnn1.json.
batch_size: Batch size used during model training
dropout_rate: Dropout rate applied during model training
learning_rate: Learning rate set for model training.
maximum_training_epochs: Maximum number of training epochs.
benchmark_dataset: Select one dataset from the six available benchmark datasets for model training.
- Options
  - ACP_Mixed-80: ACP-Mixed-80 dataset
  - ACP2_main: ACP2.0 main dataset
  - ACP2_alter: ACP2.0 alternative dataset
  - ACP500_ACP164: ACP500+ACP164 dataset
  - ACP500_ACP2710: ACP500+ACP2710 dataset
  - LEE_Indep: LEE+Independent dataset
alpha: Adjusts the balance between cross-entropy and contrastive loss components. Range: 0.0 to 1.0.
beta: Balances the two types of cross-entropy losses (cross-entropy loss 1 and 2).
- Options
  - 0: Only cross-entroly loss 1 is used for model training.
  - 0.5: Both cross-entropy loss 1 and 2 are used for model training.
  - 1: Only cross-entroly loss 2 is used for model training.
temperature: Temperature parameter in contrastive loss calculation.
gpu: GPU number to be used for model training, as identified by the nvidia-smi`` command. Use -1`` for CPU training.

Inference (Predicting ACPs)

To predict Anticancer Peptides (ACPs) using only peptide sequences, prepare your peptide sequence list in the FASTA format. For more detailed information about the FASTA format, refer to this link.

Use the following command to run the inference:

python inf.py --batch_size {batch_size} --model_type {model_type}
              --device {device} --output {output_file}

batch_size: The batch size used during inference
model_type: Specifies the type of optimized model. There are six optimized models available for predicting ACPs, each trained on one of six benchmark datasets. The default recommended option is ACP-Mixed-80.
- Options
  - ACP_Mixed_80: The optimized model that was trained using the ACP-Mixed-80 benchmark dataset.
  - ACP2_main: The optimized model that was trained using the ACP2.0 main benchmark dataset.
  - ACP2_alter: The optimized model that was trained using the ACP2.0 alternative benchmark dataset.
  - ACP500_ACP164: The optimized model that was trained using the ACP500+ACP164 benchmark dataset.
  - ACP500_ACP2710: The optimized model that was trained using the ACP500+ACP2710 benchmark dataset.
  - LEE_Indep: The optimized model that was trained using the LEE+Independent benchmark dataset.
device: The device used for predicting ACPs
- Options
  - cpu
  - gpu
output_file: The file where prediction results will be saved.

Note: Due to variability in the maximum peptide sequence length across each benchmark dataset, there are restrictions on the maximum input peptide sequence length for each model type.

Model Type	Maximum Number of Amino Acid Residues
ACP2_main	50
ACP2_alter	50
LEE_Indep	95
ACP500_ACP164	206
ACP500_ACP2710	206
ACP_Mixed_80	207

acppred
Release 0.0.2

Release 0.0.2

0.0.3

0.0.2

Documentation

Contrastive learning for enhancing feature extraction in anticancer peptides

Dependencies

Datasets

Model training

Inference (Predicting ACPs)

Stats

Development practices

Releases

Contributors

acppred Release 0.0.2

Release 0.0.2 Toggle Dropdown 0.0.3 0.0.2

Documentation

Contrastive learning for enhancing feature extraction in anticancer peptides

Dependencies

Datasets

Model training

Inference (Predicting ACPs)

Stats

Development practices

Releases

Contributors

acppred
Release 0.0.2

Release 0.0.2

0.0.3

0.0.2