Airflow development and deployment, simplified.


License
Apache-2.0
Install
pip install freeflow==0.1.14

Documentation

Freeflow

Airflow development and deployment, hopefully simplified.

What is this?

Ever think of starting another Airflow project and you are keep doing the same thing over-and-over again? I did. Starting a project for me have always been putting all together the development and deployment process in place so we could collaborate flawlessly and make sure no one could easilly f*cked up our production.

Those things are consists of standardization of folder structure, tracked Airflow's variables and connections, encrypt-decrypt secrets, pre-tests of DAGs, and deployment script.

Installation

The package could be installed by running,

pip install freeflow

How to start?

Quite easy, just create a new project with these script

freeflow start your_awesome_project

It'll generate a folder structure, examples, and base script for you to use.

How to use it?

Generated Template

The template generate the folder structure and some example files to get started. The other thing is it also generate the manage.py script.

This file used to run CLI commands that will help your development and deployment. You could run it by using normal python command: python manage.py -h

Folder Structure

├── conf
├── dags
├─┬ data
┆ ├── conf
┆ ├── conn
┆ ├── pool
┆ └── vars
├── test 
└── manage.py

Airflow Environments

You can make different environment to behave differently on terms of the variables, connections, pools, or deployment.

By adding a new file with *.cfg extension on conf folder, you successfully added a new environment type. By default, a default.cfg is generated from template as an example.

[imports]
conf =  # The name of Airflow configuration file to be imported
vars =  # The name of Airflow variables file to be imported
conn =  # The name of folder to be imported for connections
pool =  # The name of folder to be imported for pools

[composer] # This used when you want to deploy it to Google Composer
project_id = 
name = 
location = 
bucket = 

To use the environment, simply add --env env_name when using the manage.py. Example: python manage.py --env stg init

Changing Airflow Configuration

An example of this could be seen in the folder data/conf/. You just put the configuration that you want to override.

[core]
load_examples = False

Adding Variables

Adding variables is simply putting a JSON file into the data/vars/ folder and imports it via the environment congiuration (conf/). The JSON file is following the format that Airflow generated when you try to export the variables via the UI.

Adding Connections

Adding connection is also by putting a JSON file containing all the connection details. A basic template for the JSON is,

{
  "conn_id": "example_connection", /* Required */
  "conn_type": "", /* Required */
  "conn_uri": "",
  "conn_host": "",
  "conn_login": "",
  "conn_password": "",
  "conn_port": "",
  "conn_schema": "",
  "conn_extra": {
    "extra__example": "value" /* Differs for each connection type available */
  }
}

Security Concern

Usually, connection credentials cannot be put on git (or, anywhere) as a plaintext. And you shouldn't, if you still do it.

This framework will help you with that by automatically using mozilla/sops for encryption and decryption. To use this feature, encrypt your connection JSON file using sops and put it on the folder with *.enc.* in the filename.

Those file will automatically decrypted when you are trying to deploy the Airflow. It will not create any kind of data that could expose the encrypted data.

Adding Pools

Similar to adding connection, adding pools also using JSON with these template,

{
  "name": "example_pool",
  "slot_count": 10,
  "pool_description": "An example of pool"
}

Adding DAGs

Adding DAGs is pretty much the same as normal Airflow. Just put it on the dags/ folder.

Running Airflow

Running Locally

To run locally, first you have to initialize the Airflow by using the command:

python manage.py --env your_env init

Then, deploy the Airflow by using:

python manage.py --env your_env deploy --type direct

Next, run the Airflow as usual by using airflow webserver and airflow scheduler.

Running on Composer

Running on Composer doesn't need initialization. Just make sure that you already put the needed Composer environment data on your environment configuration, and then run:

python manage.py --env your_composer_env deploy --type composer

Make sure that you are already authenticated using gcloud auth application-default login and that your account have the access to deploy the Composer.

Running Tests

The tests are being divided into two, called: general and dags.

General Tests

General tests are tests that are provided by this template to make sure that your script is running smoothly without triggering the red ribbon flag on upper UI.

This tests includes:

  • DAG checks: name, validity, tasks name.
  • ExternalTaskSensor checks: make sure it doesn't stuck in a stupid way (e.g. wrong DAG name reffered)
  • BigQueryOperator checks: dry run all the queries to make sure no invalid query getting pushed.
  • ... hopefully more

DAGs Tests

This the tests that user created. This basically tests that resides on the tests/ folder.

Disclaimer

This is heavilly experimental and I do not guarantee anything. If you have any issues or feedbacks please put it on this repository issue page.

Have a great day!