legoai

An open-source package from LEGOAI for identifying data types


Keywords
data-science, llm, machine-learning, ontology, python
License
MIT
Install
pip install legoai==0.3

Documentation

LegoAI Logo

Empowering Business Users With Self Serve Analytics

What is it ?

This is a cutting-edge project leveraging advanced Machine Learning technologies to accurately discern and classify data types from various values. Designed to enhance data preprocessing and analysis pipelines, this tool automates the often tedious and error-prone task of manually identifying data types.

Table of contents

Getting Started

To quickly start using the pipeline just install and follow notebook below.

Datatype Identification ( Inference )

Inference Notebook

Important

openai_api_key is required for running L2 model inference.

Main Features

L1 and L2 Datatype Categorization

L1 and L2 Model

  • Has two models, L1 model (uses Classifier) that identifies normal datatypes ( integer, float, alphanumeric, range_type, date & time, open_ended_text, close_ended_text)
  • L2 model further classifies L1 datatype result that are integer or float to measure,dimension or unknown (if not classified) (uses LLM) and date & time into one of 41 date-time formats like (YYYY-MM-DDTHH:MM:SS, YYYY/MM/DD, MM-DD-YYYY HH:MM AM/PM ) (uses RegEx).

Datatype Identification Inference Workflow

DI Inference Workflow

Where to get it?

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
> pip install legoai

Performance

Note

Source Ecommerce: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
Total Tables: 9 , Total Columns: 52
Source Healthcare: https://mitre.box.com/shared/static/aw9po06ypfb9hrau4jamtvtz0e5ziucz.zip
Total Tables: 18, Total Columns: 249

Classification Report ( L1 Model )

L1 Model Classification Metrics

Classification Report ( L2 Model )

L2 Model Classification Metrics

Execution Chart ( Google Collab Environment )

DI Execution Chart

License

The project is released under the MIT License

Contributing

Any contributions to this project is welcomed, you can follow the steps below for contribution:

  1. Fork the repository.
  2. Create a new branch feature/* (git checkout -b feature)
  3. Make your changes.
  4. Commit your changes (git commit -am 'Add new feature')
  5. Push to the branch (git push origin feature)
  6. Create a new Pull Request.