GAML

Genetic Algorithm Machine Learning


Keywords
Computational, Chemistry, Genetic, Algorithm, Machine, Learning, force-field, machine-learning, solvent
License
MIT
Install
pip install GAML==0.70.1

Documentation

Genetic Algorithm Machine Learning (GAML)

Genetic Algorithm Machine Learning (GAML) software package for automated force field parameterization.

Xiang Zhong and Orlando Acevedo*, University of Miami

This machine learning based software package automates the creation of force field (FF) parameters for molecular dynamics (MD) or Monte Carlo (MC) simulations. In the current build, atomic charge development is emphasized for solvent simulations using a genetic algorithm crossover/average/mutation method. GAML outputs GROMACS formatted files in the OPLS-AA formalism for use in MD simulations. The FF parameters are validated by default against user-supplied free energies of hydration (ΔGhyd), liquid densities, and heats of vaporization (ΔHvap). However, additional condensed phased physical properties are available (or under development) for training that include: heat capacity, viscosity, self-diffusivity, dipoles, surface tension, and solubility.

Requirements

Download

git clone git://github.com/orlandoacevedo/GAML.git

Installation

pip[3] install gaml

Or using source codes

python[3] setup.py install

Usage

For helpful information, use

gaml

Or

gaml -h

Or, for sub-commands

gaml [command] -h

Option 1, use settingfile.txt

     Parameters                                    comments
===========================================       =====================================
command     = charge_gen_range                    # command to execute, required
charge_path = BPYR_BF4_charge_collection.txt      # input file path, required
atomnm      = 24                                  # the processed atom number, required
percent     = 0.8                                 # optional, default is 0.8
stepsize    = 0.01                                # optional, default is 0.01
nmround     = 3                                   # optional, default is 3
fname       = ChargeGenRange                      # optional, default is ChargeGenRange

The templates for the settingfile.txt can be found in the sample/ directory.

Option 2, use the command line

Usage:

gaml
    charge_gen_range
    charge_gen_scheme
    file_gen_gaussian
    file_gen_gromacstop
    file_gen_mdpotential
    file_gen_scripts
    fss_analysis
    GAML
    GAML_autotrain


> gaml charge_gen_range

    -f, --charge_path           input charge file path
    -i, --atomnm                total atom numbers of single system
    -p, --percent               range from 0.0 ~ 1.0, default is 0.8
    -t, --stepsize              default is 0.01
    -nr, --nmround              decimal round-off number, default is 3
    -o, --fname                 output file name, default is ChargeRange


> gaml charge_gen_scheme

    -f, --charge_path           input charge file
    -sl, --symmetry_list        list contains atom's chemical equivalent, index starting from 1
    -ol, --offset_list          two offsets to fit charge constrain
    --offset_nm                 loop numbers to for offsets
    --cl, --counter_list        force total charges in this group to zero
    -tc, --total_charge         default is 1.0
    -nz, --bool_nozero          force no zero charges was generated, default is True
    -nu, --bool_neutral         force final calculated value scaled from 1 or not, default is True
    -q, --bool_limit            force charge sign, either positive or negative, default is None
    -nr, --nmround              decimal round number, default is 2
    -b, --in_keyowrd            the mark of start in the input file
    -nm, --gennm                output file numbers, default is 5
    -lim, --threshold           threshold for the charge value generation
    -o, --fname                 output file name, default is ChargeRandomGen


> gaml file_gen_gaussian

    -ftop, --toppath            GROMACS topology file
    -f, --file_path             GROMACS output pdb/gro file
    -sr, --select_range         Angstrom, default is 10
    -bs, --basis_set            Gaussian definition, default is # HF/6-31G(d) Pop=CHelpG
    -cs, --charge_spin          Gaussian definition, default is 0 1
    -nm, --gennm                output file numbers, default is 5
    -o, --fname                 output file name, default is GaussInput


> gaml file_gen_gromacstop

    -f, --charge_path           input charge file
    -ftop, --toppath            GROMACS topology file
    -sl, --symmetry_list        a python type list contains atom's chemical equivalent
    -res, --reschoose           choose residue, default is ALL,
    -b, --in_keyowrd            the mark of start in the input file
    -e, --cut_keyowrd           the mark of end in the input file
    -nm, --gennm                output file numbers, default is 5
    -o, --fname                 output file name, default is GromacsTopfile


> gaml GAML

    -f, --file_path             input MD file
    -fc, --charge_path          input charge file
    -sl, --symmetry_list        list contains atom's chemical equivalent, index starting from 1
    -ol, --offset_list          two offsets to fit charge constrain
    --offset_nm                 loop numbers to for offsets
    --cl, --counter_list        force total charges in this group to zero
    -tc, --total_charge         default is 0.0
    -nz, --bool_nozero          force no zero charges was generated, default is True
    -nu, --bool_neutral         force final calculated value scaled from 1 or not, default is True
    -q, --bool_limit            force charge sign, either positive or negative, default is None
    -nr, --nmround              decimal round number, default is 2
    -nm, --gennm                output file numbers, default is 5
    -lim, --threshold           threshold for the charge value generation
    -d, --error_tolerance       default is 0.8
    -ex, --charge_extend_by     the value to mutate charge range bound, default is 0.3
    -ro, --ratio                ratio among Cross-over to Average to Mutation. default is 7:2:1
    -abs, --bool_abscomp        use absolute value or not
    -e, --cut_keyowrd           the mark of end in the input file
    -o, --fname                 output file name, default is ChargeGen


> gaml fss_analysis

     -f, --file_path            input analyzing file
     -t, --stepsize             default is 0.01
     -d, --error_tolerance      default is 0.28
     -abs, --bool_abscomp       default is False, use the absolute value or not
     -p, --percent              range from 0.0 ~ 1.0, default is 0.95
     -e, --cut_keyword          the mark of the end in the input file, default is MAE
     -tl, --atomtype_list       correspondent atom types, note the character '#' is not supported
     -pn, --pallette_nm         number of pallettes used to plot the graph, default is 50
     -cm, --color_map           compatible with Matplotlib modules, default is rainbow
     -o, --fname                output file name, default is FSSPlot


> file_gen_mdpotential

    -f, --file_path FILE_PATH   MD simulation result file
    -s, --chargefile            Input charge file
    -lv, --literature_value     correspondent literature value
    -i, --atomnm                total number of molecules in liquid phase, default is 500
    --MAE                       mean-absolute-value, default is 0.05
    --temperature               unit in Kelvin
    --block                     mark for file process, default is COUNT
    --bool_gas                  gas phase calculation, default is False
    -kw, --kwlist               MD result keyword list, default is Density
    -o, --fname                 output file name, default is MDProcess


> file_gen_scripts

    -n, --number                which script to choose, sequenced by -a
    -a, --available             show available built-in scripts


> GAML_autotrain

    -f, --file_path             auto training parameters all-in-one file
    --bashinterfile             user defined Bash interface file

Notes

A test for a 1-butylpyridinium-based ionic liquid can be found under the sample/ directory.

The OPLS-AA parameters for 86 conventional solvents optimized by GAML can be found under the Solvents/ directory. Files formatted for GROMACS.

Some features worth mentioning:

  • Customized selection range for Coulombic interactions with PBC removal
  • Two offsets as well as chemical equivalence considerations for random charge generation
  • The crossover/average/mutation method

References

Zhong, X.; Velez, C.; Acevedo, O. "Partial Charges Optimized by Genetic Algorithms for Deep Eutectic Solvent Simulations" J. Chem. Theory Comput., 2021, 17, (in press). doi:10.1021/acs.jctc.1c00047

About

Contributing Authors: Xiang Zhong and Orlando Acevedo*

Funding: Gratitude is expressed to the National Science Foundation.

Software License: GAML. Genetic Algorithm Machine Learning (GAML) software package. Copyright (C) 2021 Orlando Acevedo