OPUS (opus.nlpl.eu) Python API


Keywords
opus_api, api, parallel, corpora, mmt, corporate, corpus, language-model, machine-learning, opus, parallel-corpora, parallel-corpus, python
License
MIT
Install
pip install opus-api==0.6.2

Documentation

            /$$$$$$            /$$$$$$$  /$$   /$$  /$$$$$$
           /$$__  $$          | $$__  $$| $$  | $$ /$$__  $$
  /$$$$$$$| $$  \ $$  /$$$$$$ | $$  \ $$| $$  | $$| $$  \__/
 /$$_____/| $$  | $$ /$$__  $$| $$$$$$$/| $$  | $$|  $$$$$$
| $$      | $$  | $$| $$  \__/| $$____/ | $$  | $$ \____  $$
| $$      | $$  | $$| $$      | $$      | $$  | $$ /$$  \ $$
|  $$$$$$$|  $$$$$$/| $$      | $$      |  $$$$$$/|  $$$$$$/
 \_______/ \______/ |__/      |__/       \______/  \______/

pypi build Documentation Status Updates

OPUS (opus.nlpl.eu) Python API

Requirements

Download PhantomJS and make sure its in your PATH, eg:

$ wget -qO- https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 | tar xvj -C ~/.local/bin --strip 2 phantomjs-2.1.1-linux-x86_64/bin

Installation

Stable release

To install Opus API, run this command in your terminal:

$ pip install opus_api

This is the preferred method to install Opus API, as it will always install the most recent stable release.

If you don't have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for Opus API can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/yonkornilov/opus_api

Or download the tarball:

$ curl  -OL https://github.com/yonkornilov/opus_api/tarball/master

Once you have a copy of the source, you can install it with:

$ make install

Usage

Find your languages:

$ opus_api langs

[
...
  {
    "description": "en (English)",
    "id": 69,
    "name": "en"
  },
  ...
  {
    "description": "ru (Russian)",
    "id": 198,
    "name": "ru"
  }...
...
]

Find corpora:

$ opus_api get en ru --maximum 300 --minimum 3

{
  "corpora": [
    {
      "id": 1,
      "name": "OpenSubtitles2016",
      "src_tokens": "157.5M",
      "trg_tokens": "133.6M",
      "url": "http://opus.nlpl.eu/download.php?f=OpenSubtitles2016%2Fen-ru.txt.zip"
    },
  ...
    {
      "id": 13,
      "name": "KDE4",
      "src_tokens": "1.8M",
      "trg_tokens": "1.4M",
      "url": "http://opus.nlpl.eu/download.php?f=KDE4%2Fen-ru.txt.zip"
    }
  ]
}

TODO

  1. Get: parallel corpora for formats other than MOSES and TMX
  2. New feature: query available languages for corpora set

Credits

This package's CLI is powered by click.

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.