
Neural networks for MolMap generated features

MolMap, Molecular, python, pytorch
pip install molmapnets==0.0.1



Neural networks for regression and classification for molecular data, using MolMap generated features.


This package implements the neural network architects originally presented in the MolMap package, with two important differences:

  • The package is written using literate programming so all functionalities are written and tested in Jupyter notebooks, and the implementation, testing, and documentation are done together at the same time. You can find the documentation on the package website.
  • The models are implemented in PyTorch.


First you need to install MolMap and ChemBench (you can find the detailed installation guide here), then simply

pip install molmapnets

How to use the package

We need ChemBench for the datasets, MolMap for feature extraction, and finally molmapnets for the neural networks.

from chembench import dataset
from molmap import MolMap
RDKit WARNING: [13:50:43] Enabling RDKit 2019.09.3 jupyter extensions
from import SingleFeatureData, DoubleFeatureData
from molmapnets.models import MolMapRegression

And for model training we also need torch

import torch
from torch import nn, optim
from import Dataset, DataLoader, random_split

Load and process data, using the eSOL dataset here for regression

data = dataset.load_ESOL()
total samples: 1128
descriptor = MolMap(ftype='descriptor', metric='cosine',), method='umap', min_dist=0.1, n_neighbors=15,)
2021-07-23 13:50:53,798 - INFO - [bidd-molmap] - Applying grid feature map(assignment), this may take several minutes(1~30 min)
2021-07-23 13:50:56,904 - INFO - [bidd-molmap] - Finished

feature extraction

X = descriptor.batch_transform(data.x)
100%|##########| 1128/1128 [06:08<00:00,  2.78it/s]

Prepare data for model training

esol = SingleFeatureData(data.y, X)

Train, validation, and test split

train, val, test = random_split(esol, [904,112,112], generator=torch.Generator().manual_seed(7))

Batch data loader

train_loader = DataLoader(train, batch_size=8, shuffle=True)
val_loader = DataLoader(val, batch_size=8, shuffle=True)
test_loader = DataLoader(test, batch_size=8, shuffle=True)

Initialise model

model = MolMapRegression()

epochs = 5
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

Train model. The users are encouraged to tweak the training loop to achieve better performance

for epoch in range(epochs):

    running_loss = 0.0
    for i, (xb, yb) in enumerate(train_loader):

        xb, yb =,

        # zero gradients

        # forward propagation
        pred = model(xb)

        # loss calculation
        loss = criterion(pred, yb)

        # print statistics
        running_loss += loss.item()
    print('[Epoch: %2d] Training loss: %.3f' %
          (epoch + 1, running_loss / (i+1)))

print('Training finished')
/Users/olivier/opt/anaconda3/envs/molmap/lib/python3.6/site-packages/torch/nn/ UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ../c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

[Epoch:  1] Training loss: 4.530
[Epoch:  2] Training loss: 1.803
[Epoch:  3] Training loss: 1.541
[Epoch:  4] Training loss: 1.209
[Epoch:  5] Training loss: 1.092
Training finished

Please refer to the package documentation for more detailed usage.