personnamenorm

unifying person names in different notations


Keywords
linguistics, python3, text-processing
License
MIT
Install
pip install personnamenorm==0.2

Documentation

Person Name Normalisation

Unifying person names in different notations

different sources write person names in different notations:

  • Firstname Secondname Lastname
  • Lastname, Firstname Secondname

also extracted are:

  • academic degrees (e.g. 'Dr.', 'Ph.D.')
  • name prefixes (e.g. 'van ter', 'von', 'De')

included: german, french, italian, dutch

missing: spanish, portuguese

missing: double Lastnames in Spanish

Installation

pip install personnamenorm

Usage

import personnamenorm as pnn
nameobj = pnn.namenorm('Dr. Dipl. Firstname Secondname von und zu Lastname')
results in
nameobj.name <dict>
{
    'raw': 'Dr. Dipl. Firstname von und zu Lastname',
    'Firstname': ['Firstname','Secondname'],
    'Lastname': ['Lastname'],
    'title': ['Dr.','Dipl.'],
    'prefix': ['von und zu']
}

nameobj.fullname <str>
'von und zu Lastname, Firstname Secondname'

nameobj.fullname_abbrev <str>
'von und zu Lastname, F S'

more examples can be found in this file on github.

Debug-mode

by default debug mode is off.

activating the debug mode

nameobj = pnn.namenorm(<str>, True)

returns additional information as logging message.

  • used annotation dictionary
  • annotated input string as list of tuples

Logging

logging is implemented

  • writes to std-out if logging IS NOT enabled before
  • writes to the existing logging handler if other logging IS enabled before

Test

see folder 'tests' on github.

python test_personnamenorm.py