pyconcepticon

programmatic curation of concepticon-data


Keywords
data, linguistics
License
Apache-2.0
Install
pip install pyconcepticon==3.1.0

Documentation

pyconcepticon

Tooling to access and curate Concepticon data.

Build Status PyPI

Installation

pyconcepticon can be installed from PyPI running

pip install pyconcepticon

Note that pyconcepticon requires a clone or export of the concepticon data repository.

Usage

To use pyconcepticon you must have a local copy of the Concepticon data, i.e. either

  • the sources of a released version, as provided in the Downloads section of a release, or
  • a clone of this repository (or your personal fork of it).
  • or a released version of the data as archived on ZENODO.

Python API

Assuming you have downloaded release 1.2.0 DOI and unpacked the sources to a directory clld-concepticon-data-41d2bf0, you can access the data as follows:

>>> from pyconcepticon import Concepticon
>>> api = Concepticon('clld-concepticon-data-41d2bf0')
>>> conceptlist = list(api.conceptlists.values())[0]
>>> conceptlist.author
'Perrin, Loïc-Michel'
>>> conceptlist.tags
['annotated']
>>> len(conceptlist.concepts)
110
>>> list(conceptlist.concepts.values())[0]
Concept(
    id='Perrin-2010-110-1', number='1', concepticon_id='1906', concepticon_gloss='SOUR', gloss=None, 
    english='ACID', attributes={'german': 'sauer', 'french': 'acide'}, 
    _list=Conceptlist(
        _api=<pyconcepticon.api.Concepticon object at 0x7f31693be518>, 
        id='Perrin-2010-110', author='Perrin, Loïc-Michel', year=2010, list_suffix='', items=110, 
        tags=['annotated'], source_language=['english', 'french', 'german'], 
        target_language='Global', 
        url='https://journals.dartmouth.edu/cgi-bin/WebObjects/Journals.woa/xmlpage/1/article/353?htmlOnce=yes', 
        refs=['Perrin2010'], pdf=['Perrin2010'], 
        note='This list was used as an initial questionnaire for colexification studies on a world-wide sample of languages.', 
        pages='276f', alias=[], local=False))

Command line interface

Having installed pyconcepticon, you can also directly query concept lists via the terminal command concepticon. To learn about the functionality it provides run

$ concepticon -h
usage: concepticon [-h] [--log-level LOG_LEVEL] [--repos REPOS]
                   [--repos-version REPOS_VERSION]
                   COMMAND ...

optional arguments:
  -h, --help            show this help message and exit
  --log-level LOG_LEVEL
                        log level [ERROR|WARN|INFO|DEBUG] (default: 20)
  --repos REPOS         clone of concepticon/concepticon-data
  --repos-version REPOS_VERSION
                        version of repository data. Requires a git clone!
                        (default: None)

available commands:
  Run "COMAMND -h" to get help for a specific command.

  COMMAND
    attributes          Print all columns in concept lists that contain
                        surplus information.
...

To learn about individual subcommands run concepticon COMMAND -h, e.g.

$ concepticon lookup -h
usage: concepticon lookup [-h]
                          [--format {fancy_grid,fancy_outline,github,grid,html,jira,latex,latex_booktabs,latex_longtable,latex_raw,mediawiki,moinmoin,orgtbl,pipe,plain,presto,pretty,psql,rst,simple,textile,tsv,unsafehtml,youtrack}]
                          [--similarity SIMILARITY] [--full-search]
                          [--language LANGUAGE]
                          GLOSS [GLOSS ...]

Look up the specified glosses in Concepticon.

positional arguments:
  GLOSS

optional arguments:
  -h, --help            show this help message and exit
  --format {fancy_grid,fancy_outline,github,grid,html,jira,latex,latex_booktabs,latex_longtable,latex_raw,mediawiki,moinmoin,orgtbl,pipe,plain,presto,pretty,psql,rst,simple,textile,tsv,unsafehtml,youtrack}
                        Format of tabular output. (default: simple)
  --similarity SIMILARITY
                        specify level of similarity for concept mapping
                        (default: 5)
  --full-search         select between approximate search (default) and full
                        search (default: False)
  --language LANGUAGE   specify your desired language for mapping (default:
                        en)

Configuration

The Python API as well as the CLI can lookup the location of the data from a cldfcatalog config file, under the key concepticon.

Such a config file (and the repository clone) can be created automatically, by installing cldfbench and running cldfbench config.