great-expectations

Always know what to expect from your data.


Keywords
data, science, testing, pipeline, quality, dataquality, validation, datavalidation, cleandata, data-engineering, data-profilers, data-profiling, data-quality, data-science, data-unit-tests, datacleaner, datacleaning, dataunittest, eda, exploratory-analysis, exploratory-data-analysis, exploratorydataanalysis, mlops, pipeline-debt, pipeline-testing, pipeline-tests
License
Apache-2.0
Install
pip install great-expectations==1.0.0a2

Documentation

Python Versions PyPI PyPI Downloads Build Status pre-commit.ci Status codecov DOI Twitter Follow Slack Status Contributors Ruff

About GX OSS

GX OSS is a data quality platform designed by and for data engineers. It helps you surface issues quickly and clearly while also making it easier to collaborate with nontechnical stakeholders.

Its powerful technical tools start with Expectations: expressive and extensible unit tests for your data. As you create and run tests, your test definitions and results are automatically rendered in human-readable plain-language Data Docs.

Expectations and Data Docs create verifiability and clarity throughout your data quality process. That means you can spend less time translating your work for others, and more time achieving real mutual understanding across your entire organization.

Data science and data engineering teams use GX OSS to:

  • Validate data they ingest from other teams or vendors.
  • Test data for correctness post-transfomation.
  • Proactively prevent low-quality data from moving downstream and becoming visible in data products and applications.
  • Streamline knowledge capture from subject-matter experts and make implicit knowledge explicit.
  • Develop rich, shared documentation of their data.

Learn more about how data teams are using GX OSS in case studies from Great Expectations.

See Down with pipeline debt for an introduction to our pipeline data quality testing philosophy.

Our upcoming 1.0 release

We’re planning a ton of work to take GX OSS to the next level as we move to 1.0!

Our biggest goal is to improve the user and contributor experiences by streamlining the API, based on the feedback we’ve received from the community (thank you!) over the years.

Learn more about our plans for 1.0 and how we’ll be making this transition in our blog post.

Get started

GX recommends deploying GX OSS within a virtual environment. For more information about getting started with GX OSS, see Get started with Great Expectations.

  1. Run the following command in an empty base directory inside a Python virtual environment to install GX OSS:

    pip install great_expectations
  2. Run the following command to import the great_expectations module and create a Data Context:

    import great_expectations as gx
    
    context = gx.get_context()

Get support

Contribute

We deeply value the contributions and engagement of our community. We’re temporarily pausing the acceptance of new pull requests (PRs). We’re going to be updating the API and codebase frequently and significantly over the next few months—we don’t want contributors to spend time and effort only to find that we’ve just implemented a breaking change for their work.

Hold onto your fantastic ideas and PRs until after the 1.0 release, when we will be excited to resume accepting them. We appreciate your understanding and support as we make this final push toward this exciting milestone. Watch for updates in our Slack community, and thank you for being a crucial part of our journey!

Code of conduct

Everyone interacting in GX OSS project codebases, Discourse forums, Slack channels, and email communications is expected to adhere to the GX Community Code of Conduct.