LangML (Language ModeL) is a Keras-based and TensorFlow-backend language model toolkit, which provides mainstream pre-trained language models, e.g., BERT/RoBERTa/ALBERT, and their downstream application models.

Outline

Outline
Features
Installation
Quick Start
Documentation
Reference

Features

Common and widely-used Keras layers: CRF, Transformer, Attentions: Additive, ScaledDot, MultiHead, GatedAttentionUnit, and so on.
Pretrained Language Models: BERT, RoBERTa, ALBERT. Providing friendly designed interfaces and easy to implement downstream singleton, shared/unshared two-tower or multi-tower models.
Tokenizers: WPTokenizer (wordpiece), SPTokenizer (sentencepiece)
Baseline models: Text Classification, Named Entity Recognition, Contrastive Learning. It's no need to write any code, and just need to preprocess the data into a specific format and use the "langml-cli" to train various baseline models.
Prompt-Based Tuning: PTuning

Installation

You can install or upgrade langml/langml-cli via the following command:

pip install -U langml

Quick Start

Specify the Keras variant

Use pure Keras (default setting)

export TF_KERAS=0

Use TensorFlow Keras

export TF_KERAS=1

Load pretrained language models

from langml import WPTokenizer, SPTokenizer
from langml import load_bert, load_albert

# load bert / roberta plm
bert_model, bert = load_bert(config_path, checkpoint_path)
# load albert plm
albert_model, albert = load_albert(config_path, checkpoint_path)
# load wordpiece tokenizer
wp_tokenizer = WPTokenizer(vocab_path, lowercase)
# load sentencepiece tokenizer
sp_tokenizer = SPTokenizer(vocab_path, lowercase)

Finetune a model

from langml import keras, L
from langml import load_bert

config_path = '/path/to/bert_config.json'
ckpt_path = '/path/to/bert_model.ckpt'
vocab_path = '/path/to/vocab.txt'

bert_model, bert_instance = load_bert(config_path, ckpt_path)
# get CLS representation
cls_output = L.Lambda(lambda x: x[:, 0])(bert_model.output)
output = L.Dense(2, activation='softmax',
                 kernel_intializer=bert_instance.initializer)(cls_output)
train_model = keras.Model(bert_model.input, cls_output)
train_model.summary()
train_model.compile(loss='categorical_crossentropy', optimizer=keras.optimizer.Adam(1e-5))

Use langml-cli to train baseline models

Text Classification

$ langml-cli baseline clf --help
Usage: langml baseline clf [OPTIONS] COMMAND [ARGS]...

  classification command line tools

Options:
  --help  Show this message and exit.

Commands:
  bert
  bilstm
  textcnn

Named Entity Recognition

$ langml-cli baseline ner --help
Usage: langml baseline ner [OPTIONS] COMMAND [ARGS]...

  ner command line tools

Options:
  --help  Show this message and exit.

Commands:
  bert-crf
  lstm-crf

Contrastive Learning

$ langml-cli baseline contrastive --help
Usage: langml baseline contrastive [OPTIONS] COMMAND [ARGS]...

  contrastive learning command line tools

Options:
  --help  Show this message and exit.

Commands:
  simcse

Text Matching

$ langml-cli baseline matching --help
Usage: langml baseline matching [OPTIONS] COMMAND [ARGS]...

  text matching command line tools

Options:
  --help  Show this message and exit.

Commands:
  sbert

Documentation

Please visit the langml.readthedocs.io to check the latest documentation.

Reference

The implementation of pretrained language model is inspired by CyberZHG/keras-bert and bojone/bert4keras.

langml
Release 0.4.2

Release 0.4.2

0.4.2

0.4.1

0.4.0

0.2.4

0.2.3

0.2.2

0.2.1

0.2.0

0.1.1

0.1.0

Documentation

Outline

Features

Installation

Quick Start

Specify the Keras variant

Load pretrained language models

Finetune a model

Use langml-cli to train baseline models

Documentation

Reference

Stats

Development practices

Releases

Contributors

langml Release 0.4.2

Release 0.4.2 Toggle Dropdown 0.4.2 0.4.1 0.4.0 0.2.4 0.2.3 0.2.2 0.2.1 0.2.0 0.1.1 0.1.0

Documentation

Outline

Features

Installation

Quick Start

Specify the Keras variant

Load pretrained language models

Finetune a model

Use langml-cli to train baseline models

Documentation

Reference

Stats

Development practices

Releases

Contributors

langml
Release 0.4.2

Release 0.4.2

0.4.2

0.4.1

0.4.0

0.2.4

0.2.3

0.2.2

0.2.1

0.2.0

0.1.1

0.1.0