airflow-metrics-gbq

Airflow metrics to Google BigQuery


License
BSD-3-Clause-Attribution
Install
pip install airflow-metrics-gbq==0.1.0

Documentation

Airflow Metrics to BigQuery

build release PyPI PyPI - License

Sends airflow metrics to Bigquery


Installation

pip install airflow-metrics-gbq

Usage

  1. Activate statsd metrics in airflow.cfg
[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
  1. Restart the webserver and the scheduler
systemctl restart airflow-webserver.service
systemctl restart airflow-scheduler.service
  1. Check that airflow is sending out metrics:
nc -l -u localhost 8125
  1. Install this package
  2. Create required tables (counters, gauges and timers), an example is shared here
  3. Create materialized views which refresh when the base table changes, as describe here
  4. Create a simple python script monitor.py to provide configuration:
from airflow_metrics_gbq.metrics import AirflowMonitor

if __name__ == '__main__':
    monitor = AirflowMonitor(
        host="localhost", # Statsd host (airflow.cfg)
        port=8125, # Statsd port (airflow.cfg)
        gcp_credentials="path/to/service/account.json",
        dataset_id="monitoring", # dataset where the monitoring tables are
        counts_table="counts", # counters table
        last_table="last", # gauges table
        timers_table="timers" # timers table
    )
    monitor.run()
  1. Run the program, ideally in the background to start sending metrics to BigQuery:
python monitor.py &
  1. The logs can be viewed in the GCP console under the airflow_monitoring app_name in Google Cloud Logging.

Future releases

  • Increase test coverage (unit and integration tests)
  • Add proper typing and mypy support and checks
  • Provide more configurable options
  • Provide better documentation