A python implement of Atom2Vec: a simple way to describe atoms for machine learning
It is a powerful but simple method to transfer atoms into vectors, quite similar to Word2Vec in NLP.
To run this program, you will need
Numpy packages. If you want to generate your own dataset, you may also need
Requests package for web requests.
- Anaconda environment is highly recommended.
If you have installed
pip, you may use these commands to install these packages.
# on Linux pip3 install scipy numpy requests # on Windows pip install scipy numpy requests
How To Use
from Atom2Vec import Atom2Vec # data_file: path to the dataset file # vec_length: length of atom vector you want atoms_vec = Atom2Vec(data_file, vec_length) atoms_vec.saveAll()
Generating index 77402/77402 -- Complete! Building matrix -- Complete! SVD -- Complete!
Also, this package contains a dataset, which was obtained from Material Project using
GetMP.py. The raw response is stored in
string_3.json. Then the response is further processed by
Preprocess.py, whose result is saved to
string.json for further use.
The output is kept in
atoms_vec.txtcontains a M * N matrix. M is the index of the atoms. N is the length of atom vector. Each row represents a vector describing certain atom.
atoms_index.txtcontains a M * 1 matrix. Each row contains a integer, which is the atomic number of certain atom. It tells which atom each row represents in
Atom2Vec.py also contains a simple test, which can be run by
# Linux python3 Atom2Vec # Windows python Atom2Vec
If the program can run normally, it will exit with no errors raised.
Interactive Similarity Map
We can calculate cosine distance to quantify similarity between every atom.
You can find a interactive similarity map on https://www.yuxingfei.com/src/similarity.html