embeval

A framework for embedding evaluation automation and visualization.


Keywords
data-visualization, embedding-evaluation, semantic-similarity, word-embeddings
License
GPL-3.0
Install
pip install embeval==0.1.3

Documentation

EmbEval

EmbEval is a framework that aims to provide a way to evaluate an arbitrary amount of word embeddings in an arbitrary amount of tasks, in parallel.

To aid with the interpretability of the results, embeval resorts to graphs to visualize the performance of the different type of embeddings across each task.

Getting Started

Installation

Install embeval with pip:

pip3 install embeval

Usage (Command Line)

embeval --help
    Usage: embeval [OPTIONS] COMMAND [ARGS]...

    Options:
        --help  Show this message and exit.

    Commands:
        semantic-similarity

 embeval semantic-similarity --help
     Usage: embeval semantic-similarity [OPTIONS] EMBEDDING_DIR TESTSET_DIR

     Options:
         --workers INTEGER               Number of worker processes to use.
         --output_path TEXT              Path to write output files to.
         --output_format [text|graph|both]
         --help                          Show this message and exit.

 embeval semantic-similarity --output_path output/ embeddings/ testsets/

Using/Extending EmbEval

To extend the code to include tasks not provided in the current implementation (contributions would be most welcome), n concepts must be implemented:

  • Command (See Semantic Similarity Command) -- This is what will make your task available under the CLI and also will command the flow of execution when called upon. Click is used as the CLI package. The entrypoint for an extended application must import the main cli object and register all the available commands (See main).
  • Processing Pipeline (See generics and Semantic Similarity Pipeline -- This is where the producer, processor and consumer are implemented to execute tasks. The implementation makes use of the library and methodology of pseq.
  • Store (See Semantic Similarity Store) -- Simple object to keep track of evaluation results obtained during the processing pipeline.
  • Task (See Semantic Similarity Task) -- A task object which encapsulates needed information to be shared in the pipeline, such as paths to files.
  • Visualization (See text visualization) -- Defines a method of visualization.

Plans

  • ☐ Finish Semantic Similarity visualization.
  • ☐ Integrate GLUE tasks via jiant framework.

License

Distributed under GPL-3.0 License. See the LICENSE file for details.