invenio-oarepo-multilingual

Multilingual support for OARepo


License
MIT
Install
pip install invenio-oarepo-multilingual==1.1.2

Documentation

OARepo multilingual data model

image image image image

Multilingual string data model for OARepo.

Instalation

    pip install oarepo-multilingual

Usage

The library provides multilingual type for json schema with marshmallow validation and deserialization and elastic search mapping. Multilingual is type which allows you to add multilingual strings in your json schema in format "en":"something, "en-us":"something else" or default value "_" : "default value"

JSON Schema

Add this package to your dependencies and use it via $ref in json schema as "[server]/schemas/multilingual-v2.0.0.json#/definitions/multilingual".

Usage example

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "title": {
            "$ref": "https://localhost:5000/schemas/multilingual-v2.0.0.json#/definitions/multilingual"
      }
  }
}
{
  "type": "object",
  "properties": {
    "title": {
            "en": "something",
            "en-us": "something else"
      }
  }
}

Marshmallow

For data validation and deserialization.

If marshmallow validation is performed within application context, languages are validated against SUPPORTED_LANGUAGES config. If the validation is performed outside app context, the keys are not checked against a list of languages but a generic validation is performed - keys must be in ISO 639-1 or language-region format from RFC 5646.

Usage example

    class MD(marshmallow.Schema):
         title = MultilingualStringSchemaV2()

    data = {
        'title':
            {
            "en": "something",
            "en-us": "something else",
            }
        }

    MD().load(data)

Supported languages validation

You can specified supported languages in your application configuration in SUPPORTED_LANGUAGES . Then only these languages are allowed as multilingual string. You must specified your languages in format "en" or "en-us".

Usage example

app.config.update(SUPPORTED_LANGUAGES = ["cs", "en"])

Elastic search mapping

Define type of your multilingual string as multilingual Add to your configuration definition of ELASTICSEARCH_DEFAULT_LANGUAGE_TEMPLATE which will be used as default mapping template for supported languages.

Default template example

ELASTICSEARCH_DEFAULT_LANGUAGE_TEMPLATE={
            "type": "text",
            "fields": {
                "keywords": {
                    "type": "keyword"
                }
            }
        }

You can also specified different templates for specific languages with ELASTICSEARCH_LANGUAGE_TEMPLATES. Use # and id for adding more templates for one specific language

Language templates example

ELASTICSEARCH_LANGUAGE_TEMPLATES={
        "cs": {
            "type": "text",
            "fields": {
                "keywords": {
                    "type": "keyword"
                }
            }
        },
        "cs#plain": {
            "type": "text",
        },
        "en": {
            "type": "text",
            "fields": {
                "keywords": {
                    "type": "keyword"
                }
            }
        }
    }

It can be used a placeholder '' instead of particular language and schema will be used for all SUPPORTED LANGUAGES. The placeholder '' can be used in whole schema at the any level. Currently suported placeholeder is only *, but it will be changed.

ELASTICSEARCH_LANGUAGE_TEMPLATES={
        "*#context": {
            "type": "text",
            "copy_to": "field.*",
            "fields": {
                "raw": {
                    "type": "keyword"
                }
            }
        }

    }

Usage example

{
  "mappings": {
    "properties": {
    "title":
      {"type": "multilingual"}
    }
  }
}

Usage example with context

{
  "mappings": {
    "properties": {
    "title":
      {"type": "multilingual#plain"}
    }
  }
}

Analyzer configuration

You can specified analysis in app configuration with ELASTICSEARCH_LANGUAGE_ANALYSIS. Use # and id for adding more analysis for one specific language.

Language analysis example

ELASTICSEARCH_LANGUAGE_ANALYSIS= {
        "cs#title": {"czech#title": {
        "type": "custom",
        "char_filter": [
            "html_strip"
        ],
        "tokenizer": "standard"
        }},
        "cs": {"czech": {
            "type": "custom",
            "char_filter": [
                "html_strip"
            ],
            "tokenizer": "standard",
            "filter": [
                "lowercase",
                "stop",
                "snowball"
            ]
        }}
    }

Usage example

{
"settings":{
      "analysis": {
        "analyzer": {
         "oarepo:extends": "multilingual_analysis"
          }
      }
},
"mappings": {
   ...
}
}
{
"settings":{
      "analysis": {
        "analyzer": {
         "oarepo:extends": "multilingual_analysis#title"
          }
      }
},
"mappings": {
   ...
}
}