cltrier_nlp


Keywords
academia, large-language-models, natural-language-processing, teaching
License
Apache-2.0
Install
pip install cltrier_nlp==0.1.8

Documentation

CLTrier NLP: academic teaching toolbox

Usage

Install

pip install cltrier_nlp

Development

Install

The project is managed by Poetry, a dependency management and packaging library. Please set up a local version according to the official installation guidelines. When finished, install the local repository as follows:

# install package dependencies
poetry install

# add pre-commit to git hooks
poetry run pre-commit install  

Tests

poetry run pytest

Linting

poetry run pre-commit run --all-files

Project Structure

β”‚ 
β”œβ”€β”€ Makefile                    <- Makefile containing development targets
β”œβ”€β”€ README.md                   <- top-level README
β”œβ”€β”€ pyproject.toml              <- package-level (poetry) configuration
β”œβ”€β”€ mkdocs.yaml                 <- documentation configuration
β”œβ”€β”€ .pre-commit-config.yaml     <- git pre-commit actions
β”‚
β”œβ”€β”€ cltrier_nlp                 <- root source
β”‚   └── corpus                  <- nltk inspired corpus module
β”‚   └── encoder                 <- huggingface auto model wrapper
β”‚   └── trainer                 <- pytorch training algorithm
β”‚   └── functional              <- generic helper functions
β”‚   └── utility                 <- utility classes and types
β”‚
β”œβ”€β”€ tests                       <- unittests
β”‚
β”œβ”€β”€ examples                    <- usage/application examples
β”‚
β”œβ”€β”€ scripts                     <- additional package building scripts
β”‚   └── gen_docs_pages.py       <- automatic doc generation based on docstrings
β”‚

ToDos

  • tests: add encoder testing
  • tests: add functional testing
  • tests: add utility testing
  • cltrier_nlp:trainer: modernize and refactor
  • examples:application: encoder with manifold reduction
  • examples:application: encoder with unsupervised clustering
  • examples:application: training pipeline with pytorch MLP

Resources