Python library to guess gender given a spanish full name


Keywords
gender, guess, spanish, name
License
MIT
Install
pip install genderator==0.2.7.9

Documentation

genderator

Genderator is a Python library to process Spanish names (from Spain) to guess their gender.

For this to work, the libray uses the next datasets from Instituto Nacional de EstadĂ­stica:

  • name_surname_ratio: List of words that could be both, a name or a surname, and shows the probability to be a surname.
  • names_ine: List of registered names on Spain, with the probability for each one to be a male or a female name.
  • surnames_ine: List of registeres surnames on Spain.

Installation

The easiest way to install the latest version is by using pip to pull it from PyPI:

pip install genderator

You may also use Git to clone the repository from Github and install it manually:

git clone https://github.com/davidmogar/genderator.git
cd genderator
python setup.py install

Python 3.3 & 3.4 are supported.

Usage

The next code shows a sample usage of this library:

import genderator

guesser = genderator.Parser()
answer = guesser.guess_gender('David Moreno GarcĂ­a')
if answer:
    print(answer)
else:
    print('Name doesn\'t match')

Output:

OrderedDict([
    ('names', ['david']),
    ('surnames', ['moreno', 'garcia']),
    ('real_name', 'david'),
    ('gender', 'Male'),
    ('confidence', 1.0)
])

Options

Genderator's parser can receive some arguments to control its behaviour. Those arguments are:

  • force_combinations=Boolean: Force combinations during classification.
  • force_split=Boolean: Force name split if no surnames are detected.
  • normalize=Boolean: Enable or disable normalization.
  • normalizer_options=Dictionary: Normalizer options to be applied.

Normalizer options are a dictionary to control what normalization rules are applied to each name. Possible options are:

  • hyphens: Boolean option to enable or disable hyphens removal.
  • symbols: Boolean option to enable or disable symbols removal.
  • whitespaces: Boolean option to enable or disable extra whitespaces removal.