Calculates the Molecular Weight, to the appropriate significant digits, from a string of an arbitrary chemical formula, a protein sequence of one- or three-letter codes, or chemical common names that are recognized by PubChem.

Calculating the Molecular Weight from a Chemical Formula, Common Name, or Protein Sequence

The molecular weight (MW) can be algebraically calculated from any chemical formula that adheres to chemical conventions, which is exemplified in subsequent examples.

The ChemMW object of ChemW parses a chemical formula string -- which can consist of any combination of elements and decimal stoichiometry -- and precisely calculates the MW of the chemical formula. The significant digits of the reported MW matches the lowest significant digits from the set of elements that constitute the chemical formula, where the used elemental masses are the most precise contemporary measurements of pure elements, per the chemicals module.

The Proteins and PHREEQdg objects are applications of the ChemMW object that expand the utility of this library. The Proteins object returns the mass of a protein by either parsing a string of a protein sequence, or by parsing a FASTA-formatted file. This is applied in the Codons module for genome-scale biology and bioengineering. The PHREEQdb object of ChemW parses a PHREEQ database via the ChemMW object and exports a JSON of mineral masses for all of the described minerals in the database. This is pivotally applied to calculate the masses of complex minerals in the PHREEQC databases of the ROSSpy module for reverse osmosis research.

The ChemW library is offered with the MIT License. Examples of the module are available in the examples directory of the ChemW GitHub repository. Please submit errors or inaccuracies as GitHub issues so that they may be resolved.


The following command installs ChemW in a command prompt/terminal environment:

pip install chemw



The data environment, in a Python IDE, is defined:

import chemw
chem_mw = chemw.ChemMW(verbose = False, printing = True)
  • verbose & printing bool: specifies whether troubleshooting information or MW results will be printed, respectively.


chem_mw.mass(formula = None, common_name = None)
  • formula str: parameterizes the chemical formula for which the MW is desired. The acceptable formats for the formula are quite broad, which are exemplified in the following table:
Example chemical Format option
'C6H12O6' Any organic compounds can be easily processed.
'C60_H120_O2' Underscores can arbitrarily separate content, since these are ignored by ChemMW.
An arbitrary number of groups can be distinguished in the chemical formula,
with () denoting the boundaries of the specified group.
'Na2.43Cl(Ca(OH)2)1.2' Chemical groups can be nested, with differing stoichiometric values.
Water molecules can be complexed,
with a leading stoichiometric quantity of the complexation.
Stoichiometry can be any decimal for any atom in a molecule,
and even omit a leading zero.
'Na2SO4:3K2SO4' Non-water entities can be complexed.
'CaCl2:(MgCl2)2:12H2O' Multiple complexations can be applied with repeated : separators.
'Ca1.019Na.136K.006Al2.18Si6.82O18:7.33H2O' The complexity, while remaining within the aforementioned format, is arbitrary.
  • common_name str: parameterizes the common name of the chemical for which the MW is desired, as it is recognized by Pubchem.

Accessible content

The ChemMW object retains numerous components that are accessible to the user:

  • mw & raw_mw float: The MW of the parameterized chemical formula with and without the proper significant digits, respectively.
  • proportions dict: The ratio of elements in the chemical formula. This loses accuracy with the grouped elements, and is being improved.
  • formula str: The original chemical formula as a string.
  • groups int: A numerical counter for the quantity of chemical groups that are
  • group_masses dict: A dictionary for the masses of each nesting level in a molecule.



The data environment, in a Python IDE, is defined:

import chemw
protein = chemw.Proteins(verbose = False, printing = True)
  • verbose & printing bool: specifies whether troubleshooting information or MW results will be printed, respectively.


protein.mass(protein_sequence = None,  fasta_path = None, fasta_link = None  # providing the link to a FASTA file as a string = None)
  • protein_sequence str: The sequence of the protein for which the MW is desired. The acceptable formats for the formula are quite broad, which are exemplified in the following formulae:
Example sequence Format option
'LFCTHGLERVVZCLWHKRCCSTRLKSLLLRGCABC*' A single string of the one-letter amino acid codes. A trailing "*" is acceptable.
'gly-gln-his-ala-arg-asn-phe-pro-thr' A sequence of three-letter amino acid codes must be delimited with hyphens.
  • fasta_path & fasta_link str: The path and URL link, respectively, to a FASTA file that contains the sequence, or multiple sequences, of the protein(s) for which the MW is desired. Each sequence must commence with a > as the first character of the description line.

Accessible content

The Proteins object retains numerous components that are accessible to the user:

  • protein_mass & raw_protein_mass float: The protein mass that is adjusted and unadjusted for the appropriate number of significant digits.
  • fasta_protein_masses dict: A dictionary of each sequence from processing a FASTA file, where the value is the corresponding sequence's mass.
  • amino_acid_masses dict: A dictionary of all natural amino acids, and their masses to the appropriate number of significant digits.
  • fasta_lines list: The raw list of lines that constitute the loaded FASTA file, which can be used for post-processing.
  • sigfigs float: The number of sigfigs that are defined for each protein.
  • chem_mw ChemMW: An instance of the ChemMW object is loaded, which allows the user to access the ChemMW module through the PHREEQdb module.



The data environment, in a Python IDE, is defined:

import chemw
phreeq_db = chemw.PHREEQdb(output_path = None, verbose = False, printing = False)
  • output_path str: optionally specifies an path to where the processed PHREEQ database file will be exported, where None selects the current working directory.
  • verbose & printing bool: optionally specifies whether progress or results of the calculations, respectively, are printed. The former is valuable for troubleshooting while the latter is beneficial for reviewing a readout summary of the calculations.


A PHREEQ database file is processed into a JSON file of the elements and minerals, with their respective formula and MW:

  • db_path str: The path to where the .dat PHREEQ database file that will be processed.

Accessible content

The PHREEQdb object retains numerous components that are accessible to the user:

  • db_name str: The name of the database that is parsed in the process() function.
  • db, minerals, & elements Pandas.DataFrame: The entire PHREEQ database and the minerals and elements of the PHREEQ database, respectively, expressed in a Pandas Database object, and organized with labeled columns of the content.
  • chem_mw ChemMW: An instance of the ChemMW object is loaded, which allows the user to access the ChemMW module through the PHREEQdb module.