sqlizer

Orchestration service for SQL only ETL workflows.


Keywords
microservice, ETL, SQL, Workflow, Pipeline, DWH, data, warehouse, airflow, luigi, orchestration
License
Apache-2.0
Install
pip install sqlizer==0.0.1

Documentation

What is SQLizer

A simple orchestration service for SQL-only ETL workflows.

Why SQLizer

In many cases you can use SQL only for ETL (extract/transform/load) pipelines relying on CTAS (create table as) queries and the builting import/export futures of your RDBMS or data warehouse software (eg. Redshift).

This service was borned out of a need to orchestrate a complete data processing pipeline with the use of AWS Redshift only.

Setting up the development environment

python3 -m venv ./.venv
echo ".venv/" >> .gitignore
source .venv/bin/activate
pip install -e .

Optionally install development/test dependencies:

pip install pytest pytest-runner codecov pytest-cov recommonmark

Prepare the docker image (and test it):

docker build -t sqlizer .
docker run --rm  --name sqlizer-runner -e "job_id=sqlizer" -e "bucket=sss" sqlizer

Prepare test data:

aws s3 mb s3://sqlizer-workflows --profile your-profile
aws s3 sync ~/Code/sqlizer/test-data/ s3://sqlizer-workflows --profile your-profile

Add parameters to the Systems Manager's Parameter Store:

aws ssm put-parameter --overwrite --name sqlizer.default.auth --value user:password --type SecureString --description "authentication details for data-source" --profile your-profile
aws ssm put-parameter --overwrite --name sqlizer.default.host --value "some-cluster.redshift.amazonaws.com:5439/database" --type SecureString --description "url access for default data source" --profile your-profile

Run it locally:

export AWS_PROFILE=your-profile
#sqlizer --connection-url="root:some_secret_pass@some-cluster.redshift.amazonaws.com:5439/database" --bucket="s3://sqlizer-workflows"
sqlizer