zahlwort2num

A small package for handy conversion of german numerals (also ordinal / signed) written as words to numbers.


Keywords
german, nlp, numeral, converter, deutsch, sprache, ordinal, zahlen, human, number
License
MIT
Install
pip install zahlwort2num==0.4.2

Documentation

ZahlWort2num (v.0.4.2)

­čçę­čç¬ ­čçę­čç¬ ­čçę­čç¬ A small but useful (due shortage of/low quality support for lang_de) package for handy conversion of german numerals (incl. ordinal number) written as string to the from numbers.

To put it differently: It allows reverse text normalization for numbers.

This package might be a good complementary lib to https://github.com/savoirfairelinux/num2words

­čś┐ Currently is doesn't support swiss variant. TBD ­čçĘ­čçş

PyPi direct page of project

https://pypi.org/project/zahlwort2num/

Installation

pip2 install zahlwort2num

Usage

Definition:

import zahlwort2num as w2n

Few examples:

w2n.convert('Zweihundertf├╝nfundzwanzig') # => 225
w2n.convert('neunte') # => '9.' 
w2n.convert('minus siebenhundert Millionen achtundsiebzig') # => -700000078

or even stuff like: ­čÖł

w2n.convert('sechshundertdreiundf├╝nfzigtausendf├╝nfhunderteinundzwanzig') # => 653521

Command line:

  • (Obviously it is better to use a parameter enclosed with apostrophs due to possible spaces)
bin/zahlwort2num-convert 'eine Million siebenhunderteinundzwanzig'

Development

Before doing anything. Install flake8 locally by

python3 -m pip install -r requirements.txt

Make sure tests are passing

python3 -m unittest

and you locally run linter via

flake8 ./zahlwort2num/*.py --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

WIKI

TBD

Already implemented features ­čśÄ

  • Theoretically it works for any numbers from range 0 upto 999 * 10^27 [big numbers]
  • Command-line mode (see above)
  • Supported with ordinal numerals (incl. inflections [sufficies like 'ste', 'ten' etc. ])
    In this case it returns coerced String type value e.g '15.' instead of Integer ÔśŁ´ŞĆ
  • Relative mild rules in terms of trailing whitespaces, lower/upper-case (unification).
  • Handling of signed numerals (also ordinal ones) e.g 'minus zehn'

TODO / Known issues

  • Make POC, functional for all common cases

  • Ordinal number support

  • Take care for exceptions / trailing whitespaces etc.

  • Make structure + publish as PyPI package

  • Command line support ­čĺ╗

  • Added support for both non-direct usage e.g einhundert / hundert

  • Simplify/refactor POC code, add better documentation

  • Zwo variant

  • Added linter with Test Suite as hook

  • More comprehensible tests

  • Swiss variant

  • More fault tolerant (├č -> ss) etc

  • Larger scale than 10^60

  • Ordinal with very large numbers (without addons) e.g Millionste

  • Few benchmark improvements (e.g tail recursion etc)

  • Better error handling + validation for idiotical cases (e.g minus null Miliarde)

  • Support for fractions?

Thanks

  • @warichet for addressing problem
  • @spatialbitz for writing simple fix ­čĹŹ
  • @psawa - adding support for zwo case
  • ... lastly to any of you who uses this package ;-)