Airflow Sample Provider
Guidelines on building, deploying, and maintaining provider packages that will help Airflow users interface with external systems. Maintained with ❤️ by Astronomer.
This repository provides best practices for building, structuring, and deploying Airflow provider packages as independent python modules available on PyPI.
Provider repositories must be public on Github and follow the structural and technical guidelines laid out in this Readme. Ensure that all of these requirements have been met before submitting a provider package for community review.
Here, you'll find information on requirements and best practices for key aspects of your project:
- File formatting
- Development
- Airflow integration
- Documentation
- Testing
Formatting Standards
Before writing and testing the functionality of your provider package, ensure that your project follows these formatting conventions.
Package name
The highest level directory in the provider package should be named in the following format:
airflow-provider-<provider-name>
Repository structure
All provider packages must adhere to the following file structure:
├── LICENSE # A license is required, MIT or Apache is preferred.
├── README.md
├── sample_provider # Your package import directory. This will contain all Airflow modules and example DAGs.
│ ├── __init__.py
│ ├── example_dags
│ │ └── sample.py
│ ├── hooks
│ │ ├── __init__.py
│ │ └── sample.py
│ ├── operators
│ │ ├── __init__.py
│ │ └── sample.py
│ └── sensors
│ ├── __init__.py
│ └── sample.py
├── pyproject.toml # A file to define dependencies and how the package is built and shipped.
└── tests # Unit tests for each module.
├── __init__.py
├── hooks
│ ├── __init__.py
│ └── test_sample_hook.py
├── operators
│ ├── __init__.py
│ └── test_sample_operator.py
└── sensors
├── __init__.py
└── test_sample_sensor.py
Development Standards
If you followed the formatting guidelines above, you're now ready to start editing files to include standard package functionality.
Python Packaging Scripts
Your pyproject.toml
file should contain all of the appropriate metadata and dependencies required to build your package. Use the sample pyproject.toml
file in this repository as a starting point for your own project.
To improve discoverability of your provider package on PyPI, it is recommended to add classifiers to the package's metadata. The following standard classifiers should be used in addition to any others you may choose to include:
- Framework :: Apache Airflow
- Framework :: Apache Airflow :: Provider
Managing Dependencies
When building providers, these guidelines will help you avoid potential for dependency conflicts:
- It is important that the providers do not include dependencies that conflict with the underlying dependencies for a particular Airflow version. All of the default dependencies included in the core Airflow project can be found in the Airflow setup.cfg file.
- Keep all dependencies relaxed at the upper bound. At the lower bound, specify minor versions (for example,
depx >=2.0.0, <3
).
Versioning
Use standard semantic versioning for releasing your package. When cutting a new release, be sure to update all of the relevant metadata fields in your setup file.
Building Modules
All modules must follow a specific set of best practices to optimize their performance with Airflow:
-
All classes should always be able to run without access to the internet. The Airflow Scheduler parses DAGs on a regular schedule. Every time that parse happens, Airflow will execute whatever is contained in the
init
method of your class. If thatinit
method contains network requests, such as calls to a third party API, there will be problems due to repeated network calls. - Init methods should never call functions which return valid objects only at runtime. This will cause a fatal import error when trying to import a module into a DAG. A common best practice for referencing connectors and variables within DAGs is to use Jinja Templating.
-
All operator modules need an
execute
method. This method defines the logic that the operator will implement.
Modules should also take advantage of native Airflow features that allow your provider to:
- Register custom connection types, which improve the user experience when connecting to your tool.
- Include
extra-links
that link your provider back to its page on the Astronomer Registry. This provides users easy access to documentation and example DAGs.
Refer to the Airflow Integration Standards
section for more information on how to build in these extra features.
Unit testing
Your top-level tests/
folder should include unit tests for all modules that exist in the repository. You can write tests in the framework of your choice, but the Astronomer team and Airflow community typically use pytest.
You can test this package by running: python3 -m unittest
from the top-level of the directory.
Airflow Integration Standards
Airflow exposes a number of plugins to interface from your provider package. We highly encourage provider maintainers to add these plugins because they significantly improve the user experience when connecting to a provider.
Defining an entrypoint
To enable custom connections, you first need to define an apache_airflow_provider
entrypoint in your pyproject.toml
file:
[project.entry-points.apache_airflow_provider]
provider_info = "sample_provider.__init__:get_provider_info"
Next, you need to add a get_provider_info
method to the __init__
file in your top-level provider folder. This function needs to return certain metadata associated with your package in order for Airflow to use it at runtime:
__version__ = "1.0.0"
def get_provider_info():
return {
"package-name": "airflow-provider-sample", # Required
"name": "Sample", # Required
"description": "A sample template for Apache Airflow providers.", # Required
"connection-types": [
{"connection-type": "sample", "hook-class-name": "sample_provider.hooks.sample.SampleHook"}
],
"extra-links": ["sample_provider.operators.sample.SampleOperatorExtraLink"],
"versions": [__version__], # Required
}
Once you define the entrypoint, you can use native Airflow features to expose custom connection types in the Airflow UI, as well as additional links to relevant documentation.
Adding Custom Connection Forms
Airflow enables custom connection forms through discoverable hooks. The following is an example of a custom connection form for the Fivetran provider:
Add code to the hook class to initiate a discoverable hook and create a custom connection form. The following code defines a hook and a custom connection form:
class SampleHook(BaseHook):
"""
Hook docstring ...
"""
conn_name_attr = "sample_conn_id"
default_conn_name = "sample_default"
conn_type = "sample"
hook_name = "Sample"
@staticmethod
def get_connection_form_widgets() -> dict[str, Any]:
"""Returns connection widgets to add to connection form"""
from flask_appbuilder.fieldwidgets import BS3PasswordFieldWidget, BS3TextFieldWidget
from flask_babel import lazy_gettext
from wtforms import PasswordField, StringField
return {
"account": StringField(lazy_gettext("Account"), widget=BS3TextFieldWidget()),
"secret_key": PasswordField(lazy_gettext("Secret Key"), widget=BS3PasswordFieldWidget()),
}
@staticmethod
def get_ui_field_behaviour() -> dict:
"""Returns custom field behaviour"""
import json
return {
"hidden_fields": ["port", "password", "login", "schema"],
"relabeling": {},
"placeholders": {
"extra": json.dumps(
{
"example_parameter": "parameter",
},
indent=4,
),
"account": "HeirFlough",
"secret_key": "mY53cr3tk3y!",
"host": "https://www.httpbin.org",
},
}
Some notes about using custom connections:
-
get_connection_form_widgets()
creates extra fields using flask_appbuilder. A variety of field types can be created using this function, such as strings, passwords, booleans, and integers. -
get_ui_field_behaviour()
is a JSON schema describing the form field behavior. Fields can be hidden, relabeled, and given placeholder values. -
To connect a form to Airflow, add the hook class name and connection type of a discoverable hook to
"connection-types"
in theget_provider_info
method as mentioned inDefining an entrypoint
.
Adding Custom Links
Operators can add custom links that users can click to reach an external source when interacting with an operator in the Airflow UI. This link can be created dynamically based on the context of the operator. The following code example shows how to initiate an extra link within an operator:
from airflow.models import BaseOperator, BaseOperatorLink
class SampleOperatorExtraLink(BaseOperatorLink):
name = "Astronomer Registry"
def get_link(self, operator: BaseOperator, *, ti_key=None):
return "https://registry.astronomer.io"
class SampleOperator(BaseOperator):
"""
Operator docstring ...
"""
operator_extra_links = (SampleOperatorExtraLink(),)
To connect custom links to Airflow, add the operator class name to "extra-links"
in the get_provider_info
method mentioned above.
Documentation Standards
Creating excellent documentation is essential for explaining the purpose of your provider package and how to use it.
Inline Module Documentation
Every Python module, including all hooks, operators, sensors, and transfers, should be documented inline via sphinx-templated docstrings. These docstrings should be included at the top of each module file and contain three sections separated by blank lines:
- A one-sentence description explaining what the module does.
- A longer description explaining how the module works. This can include details such as code blocks or blockquotes. For more information Sphinx markdown directives, read the Sphinx documentation.
- A declarative definition of parameters that you can pass to the module, templated per the example below.
For a full example of inline module documentation, see the example operator in this repository.
README
The README for your provider package should give users an overview of what your provider package does. Specifically, it should include:
- High-level documentation about the provider's service.
- Steps for building a connection to the service from Airflow.
- What modules exist within the package.
- An exact set of dependencies and versions that your provider has been tested with.
- Guidance for contributing to the provider package.
Functional Testing Standards
To build your repo into a python wheel that can be tested, follow the steps below:
-
Clone the provider repo.
-
cd
into provider directory. -
Run
python3 -m pip install build
. -
Run
python3 -m build
to build the wheel. -
Find the .whl file in
/dist/*.whl
. -
Download the Astro CLI.
-
Create a new project directory, cd into it, and run
astro dev init
to initialize a new astro project. -
Ensure the Dockerfile contains an Astro Runtime image that supports at least Airflow 2.3.0. For example:
FROM quay.io/astronomer/astro-runtime:8.0.0
-
Copy the
.whl
file to the top level of your project directory. -
Install
.whl
in your containerized environment by adding the following to your Dockerfile:
RUN pip install --user airflow_provider_<PROVIDER_NAME>-0.0.1-py3-none-any.whl
- Copy your sample DAG to the
dags/
folder of your astro project directory. - Run
astro dev start
to build the containers and run Airflow locally (you'll need Docker on your machine). - When you're done, run
astro dev stop
to wind down the deployment. Runastro dev kill
to kill the containers and remove the local Docker volume. You can also useastro dev kill
to stop the environment before rebuilding with a new.whl
file.
Note: If you are having trouble accessing the Airflow webserver locally, there could be a bug in your wheel setup. To debug, run
docker ps
, grab the container ID of the scheduler, and rundocker logs <scheduler-container-id>
to inspect the logs.
Publishing your Provider repository for the Astronomer Registry
If you have never submitted your Provider repository for publication to the Astronomer Registry, create a new release/tag for your repository on the main
branch. Ultimately, the backend of the Astronomer Registry will check for new tags for a Provider repository to trigger adding the new version of the Provider on the Registry.
NOTE: Tags for the repository must follow typical semantic versioning.
Now that you've created a release/tag, head over to the Astronomer Registry and fill out the form with your shiny new Provider repo details!
If your Provider is currently on the Astronomer Registry, simply create a new release/tag will trigger an update to the Registry and the new version will be published.