Person Name Normalisation
Unifying person names in different notations
different sources write person names in different notations:
- Firstname Secondname Lastname
- Lastname, Firstname Secondname
also extracted are:
- academic degrees (e.g. 'Dr.', 'Ph.D.')
- name prefixes (e.g. 'van ter', 'von', 'De')
included: german, french, italian, dutch
missing: spanish, portuguese
missing: double Lastnames in Spanish
Installation
pip install personnamenorm
Usage
import personnamenorm as pnn
nameobj = pnn.namenorm('Dr. Dipl. Firstname Secondname von und zu Lastname')
results in
nameobj.name <dict>
{
'raw': 'Dr. Dipl. Firstname von und zu Lastname',
'Firstname': ['Firstname','Secondname'],
'Lastname': ['Lastname'],
'title': ['Dr.','Dipl.'],
'prefix': ['von und zu']
}
nameobj.fullname <str>
'von und zu Lastname, Firstname Secondname'
nameobj.fullname_abbrev <str>
'von und zu Lastname, F S'
more examples can be found in this file on github.
Debug-mode
by default debug mode is off.
activating the debug mode
nameobj = pnn.namenorm(<str>, True)
returns additional information as logging message.
- used annotation dictionary
- annotated input string as list of tuples
Logging
logging is implemented
- writes to std-out if logging IS NOT enabled before
- writes to the existing logging handler if other logging IS enabled before
Test
see folder 'tests' on github.
python test_personnamenorm.py