😎 Cocoon provides LLM agents to organize your data warehouse, ready for analysis.
Given a Data Task from the user, Cocoon connects to your data warehouse, explores your data, guides user step-by-step, and automatically builds SQL pipeline to complete the task.
👉 Check out the demo
- Profile: Semantically understand your data and detect anomalies
- Preview: Staging, Data Cleaning, Data Preparation
- Preview: Entity Matching, Fuzzy Join, Column Standardization
- Preview: Table Transformation, Fuzzy Union, Common Data Model
- Preview: Data Model, Data Vault
- And more to come...
Profiling is the first step to understanding the table and identifying any anomalies.
Many small decisions require semantic understanding by LLMs. For example, an age of 100 is acceptable, but -1 is impossible!
- 👉 Online Service: Drop your CSV, and the profile will be ready in <10 min
- 👉 Python Package: Check out the notebook to interactively profile your table in python
- (Both run the same code; Python package requires LLM API, but is interactive and no size/#col limit)
Check out more profiles
Dataset Title | Profile Link |
---|---|
AQI and Latitude/Longitude of Countries | View Profile |
2020 Property Sales Data | View Profile |
AAC Shelter Cat Outcome | View Profile |
Books | View Profile |
Cancer | View Profile |
Divorces 2000-2015 | View Profile |
German Credit Data | View Profile |
K-Drama | View Profile |
Patients | View Profile |
Used Car Data | View Profile |
Cite Cocoon Profile
@article{huang2024cocoon,
title={Cocoon: Semantic Table Profiling Using Large Language Models},
author={Huang, Zezhou and Wu, Eugene},
journal={arXiv preprint arXiv:2404.12552},
year={2024}
}
Screenshot where LLMs interactively suggest data cleaning (cast columns and fix cases). The output is DBT staging sql/yml.
Join could be challenging when a standardized join key is missing (e.g., join by non-standardized names).
We help you find the related ones, and explain how they are related.
Cite Cocoon Fuzzy Join
@article{huang2024disambiguate,
title={Disambiguate Entity Matching through Relation Discovery with Large Language Models},
author={Huang, Zezhou},
journal={arXiv preprint arXiv:2403.17344},
year={2024}
}
Give us the source table and the (example of) target table, we help you fuzzy union/transform.
Give us a database, we design the data model/data vault
We are working on tools to help understand data, break silos and maintain pipelines for the data warehouse.
These will make discovering tables, generating reports, and making predictions incredibly simple.
Email zh2408@columbia.edu to learn more...