A maximum-strength name parser for record linkage.


Keywords
data, database, English, names, parser, text, data-science, parsing, record-linkage, entity-resolution, deduplication, data-matching, human-name
License
Other
Install
pip install nominally==1.1.0

Documentation

Nominally Logo

nominally: a maximum-strength name parser for record linkage

License: AGPL 3.0+ Distributed via PyPI Maintainability rated at Code Climate Builds at CircleCI Test coverage at Coveralls Documentation at Read the Docs Latest commit at GitHub

🔗 Names

Nominally simplifies and parses a personal name written in Western name order into six core fields: title, first, middle, last, suffix, and nickname.

Typically, nominally is used to parse entire lists or pd.Series of names en masse. This package includes a command line tool to parse a single name for convenient one-off testing and examples.

Nominally produces fields intended for comparisons between or within datasets. As such, names come out formatted for data without regard to human syntactic preference: de von ausfern, mr johann g rather than Mr. Johann G. de von Ausfern.

📓 Getting Started

Call parse_name() to parse out the six core fields:

$ python -q
>>> from nominally import parse_name
>>> parse_name("Blankinsop, Jr., Mr. James 'Jimmy'")
{
  'title': 'mr',
  'first': 'james',
  'middle': '',
  'last': 'blankinsop',
  'suffix': 'jr',
  'nickname': 'jimmy'
}

Dive into the Name class to parse and recreate a string...

>>> from nominally import Name
>>> n = Name("DR. PEACHES BARTKOWICZ")
>>> n
Name({'title': 'dr', 'first': 'peaches', 'middle': '', 'last': 'bartkowicz', 'suffix': '', 'nickname': ''})
>>> str(n)
'bartkowicz, dr peaches'

...or use the dict...

>>> dict(n)
{
  'title': 'dr',
  'first': 'peaches',
  'middle': '',
  'last': 'bartkowicz',
  'suffix': '',
  'nickname': ''
}
>>> list(n.values())
['dr', 'peaches', '', 'bartkowicz', '', '']

...or retrieve a more elaborate set of attributes...

>>> n.report()
{
  'raw': 'DR. PEACHES BARTKOWICZ',
  'cleaned': {'dr peaches bartkowicz'},
  'parsed': 'bartkowicz, dr peaches',
  'list': ['dr', 'peaches', '', 'bartkowicz', '', ''],
  'title': 'dr',
  'first': 'peaches',
  'middle': '',
  'last': 'bartkowicz',
  'suffix': '',
  'nickname': ''
}

...or capture individual attributes.

>>> n.first
'peaches'
>>> n['last']
'bartkowicz'
>>> n.get('title')
'dr'
>>> n.raw
'DR. PEACHES BARTKOWICZ'

🖥️ Command Line

For a quick report, invoke the nominally command line tool:

$ nominally "DR. PEACHES BARTKOWICZ"
       raw: DR. PEACHES BARTKOWICZ
   cleaned: dr. peaches bartkowicz
    parsed: bartkowicz, dr peaches
      list: ['dr', 'peaches', '', 'bartkowicz', '', '']
     title: dr
     first: peaches
    middle:
      last: bartkowicz
    suffix:
  nickname:

🔬 Worked Examples

Binder hosts live Jupyter notebooks walking through examples of nominally.

     csv.ipynb on mybinder.org

     pandas_simple.ipynb on mybinder.org

These notebooks and additional examples reside in the Nominally Examples repository.

🧙‍ Author

Matt VanEseltine

https://pypi.org/user/matvan/

matvan@umich.edu

https://github.com/vaneseltine

https://twitter.com/vaneseltine

https://stackoverflow.com/users/7846185/matt-vaneseltine

💡 Acknowledgements

Nominally started as a fork of the python-nameparser package, and has benefitted considerably from this origin⸺especially the wealth of examples and tests developed for python-nameparser.