Library for executable ML pipelines represented by KGs.


Keywords
data-science, knowledge-graph-construction, machine-learning, machine-learning-pipelines, python
License
AGPL-3.0
Install
pip install exe-kg-lib==2.1.2

Documentation

ExeKGLib

PyPI Python Poetry Code style: black License

Python library for conveniently constructing and executing Machine Learning (ML) pipelines represented by Knowledge Graphs (KGs). It features a coding interface and a CLI, and allows the user to:

  1. Construct an ML pipeline that gets a CSV as input and processes the data using any of the available tasks and methods.
  2. Save the constructed pipeline as a KG in Turtle format.
  3. Execute the generated KG.

The coding interface is demonstrated with three sample Python files. The pipelines represented by the generated sample KGs are briefly explained below:

  1. ML pipeline: Loads features and labels from an input CSV dataset, splits the data, trains and tests a k-NN model, and visualizes the prediction errors.
  2. Statistics pipeline: Loads a feature from an input CSV dataset, normalizes it, and plots its values (before and after normalization) using a scatter plot.
  3. Visualization pipeline: Loads a feature from an input CSV dataset and plots its values using a line plot.

Under the hood, ExeKGLib uses well-known Python libraries for data processing and visualization and performing predictions such as pandas, matplotlib, and scikit-learn.

ExeKGLib is part of the following paper submitted to ESWC 2023:
Klironomos A., Zhou B., Tan Z., Zheng Z., Gad-Elrab M., Paulheim H., Kharlamov E.: ExeKGLib: A Python Library for Machine Learning Analytics based on Knowledge Graphs

Detailed information (installation, documentation etc.) about ExeKGLib can be found in its website and basic information is shown below.

Installation

To install, run pip install exe-kg-lib.

For detailed installation instructions, refer to the installation page of ExeKGLib's website.

Ready-to-use ML-related tasks and methods

Click to expand
KG schema (abbreviation) Task Method Properties Input (data structure) Output (data structure) Implemented by Python class
Machine Learning (ml) Train KNNTrain - DataInTrainX (Matrix or Vector)
DataInTrainY (Matrix or Vector)
DataOutPredictedValueTrain (Matrix or Vector)
DataOutTrainModel (SingleValue)
TrainKNNTrain
Machine Learning (ml) Train MLPTrain - DataInTrainX (Matrix or Vector)
DataInTrainY (Matrix or Vector)
DataOutPredictedValueTrain (Matrix or Vector)
DataOutTrainModel (SingleValue)
TrainMLPTrain
Machine Learning (ml) Train LRTrain - DataInTrainX (Matrix or Vector)
DataInTrainY (Matrix or Vector)
DataOutPredictedValueTrain (Matrix or Vector)
DataOutTrainModel (SingleValue)
TrainLRTrain
Machine Learning (ml) Test KNNTest - DataInTestModel (SingleValue)
DataInTestX (Matrix or Vector)
DataOutPredictedValueTest (Matrix or Vector) TestKNNTest
Machine Learning (ml) Test MLPTest - DataInTestModel (SingleValue)
DataInTestX (Matrix or Vector)
DataOutPredictedValueTest (Matrix or Vector) TestMLPTest
Machine Learning (ml) Test LRTest - DataInTestModel (SingleValue)
DataInTestX (Matrix or Vector)
DataOutPredictedValueTest (Matrix or Vector) TestLRTest
Machine Learning (ml) PerformanceCalculation PerformanceCalculationMethod - DataInTrainRealY (Matrix or Vector)
DataInTrainPredictedY (Matrix or Vector)
DataInTestPredictedY (Matrix or Vector)
DataInTestRealY (Matrix or Vector)
DataOutMLTestErr (Vector)
DataOutMLTrainErr (Vector)
PerformanceCalculationPerformanceCalculationMethod
Machine Learning (ml) Concatenation ConcatenationMethod - DataInConcatenation (list of Vector) DataOutConcatenatedData (Matrix) ConcatenationConcatenationMethod
Machine Learning (ml) DataSplitting DataSplittingMethod - DataInDataSplittingX (Matrix or Vector)
DataInDataSplittingY (Matrix or Vector)
DataOutSplittedTestDataX (Matrix or Vector)
DataOutSplittedTrainDataY (Matrix or Vector)
DataOutSplittedTrainDataX (Matrix or Vector)
DataOutSplittedTestDataY (Matrix or Vector)
DataSplittingDataSplittingMethod
Visualization (visu) CanvasTask CanvasMethod hasCanvasName (string)
hasLayout (string)
- - CanvasTaskCanvasMethod
Visualization (visu) PlotTask LineplotMethod hasLineStyle (string)
hasLineWidth (int)
hasLegendName (string)
DataInVector (Vector) - PlotTaskLineplotMethod
Visualization (visu) PlotTask ScatterplotMethod hasLineStyle (string)
hasLineWidth (int)
hasScatterSize (int)
hasLegendName (string)
DataInVector (Vector) - PlotTaskScatterplotMethod
Statistics (stats) TrendCalculationTask TrendCalculationMethod - DataInTrendCalculation (Vector) DataOutTrendCalculation (Vector) TrendCalculationTaskTrendCalculationMethod
Statistics (stats) NormalizationTask NormalizationMethod - DataInNormalization (Vector) DataOutNormalization (Vector) NormalizationTaskNormalizationMethod
Statistics (stats) ScatteringCalculationTask ScatteringCalculationMethod - DataInScatteringCalculation (Vector) DataOutScatteringCalculation (Vector) ScatteringCalculationTaskScatteringCalculationMethod

Usage

Creating an ML pipeline

  • Via code: See the provided examples. To fetch them to your working directory for easy access, run typer exe_kg_lib.cli.main run get-examples.
  • Step-by-step via CLI: Run typer exe_kg_lib.cli.main run create-pipeline.

Executing an ML pipeline

  • Via code: See example code.
  • Via CLI: Run typer exe_kg_lib.cli.main run run-pipeline <pipeline_path>.

Adding a new ML-related task and method

To perform this type of ExeKGLib extension, there are 3 required steps:

  1. Selection of a relevant bottom-level KG schema (Statistics, ML, or Visualization) according to the type of the new task and method.
  2. Addition of new semantic components (entities, properties, etc) to the selected KG schema.
  3. Addition of a Python class to the corresponding module of exe_kg_lib.classes.tasks package.

For steps 2 and 3, refer to the relevant page of ExeKGLib's website.

Documentation

See the Code Reference and Development sections of the ExeKGLib's website.

External resources

KG schemata

The above KG schemata are included in the ExeKGOntology repository.

Dataset used in code examples

The dataset was generated using the sklearn.datasets.make_classification() function of the scikit-learn Python library.

License

ExeKGLib is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.