What is SQLizer
A simple orchestration service for SQL-only ETL workflows.
Why SQLizer
In many cases you can use SQL only for ETL (extract/transform/load) pipelines relying on CTAS (create table as) queries and the builting import/export futures of your RDBMS or data warehouse software (eg. Redshift).
This service was borned out of a need to orchestrate a complete data processing pipeline with the use of AWS Redshift only.
Setting up the development environment
python3 -m venv ./.venv
echo ".venv/" >> .gitignore
source .venv/bin/activate
pip install -e .
Optionally install development/test dependencies:
pip install pytest pytest-runner codecov pytest-cov recommonmark
Prepare the docker image (and test it):
docker build -t sqlizer .
docker run --rm --name sqlizer-runner -e "job_id=sqlizer" -e "bucket=sss" sqlizer
Prepare test data:
aws s3 mb s3://sqlizer-workflows --profile your-profile
aws s3 sync ~/Code/sqlizer/test-data/ s3://sqlizer-workflows --profile your-profile
Add parameters to the Systems Manager's Parameter Store:
aws ssm put-parameter --overwrite --name sqlizer.default.auth --value user:password --type SecureString --description "authentication details for data-source" --profile your-profile
aws ssm put-parameter --overwrite --name sqlizer.default.host --value "some-cluster.redshift.amazonaws.com:5439/database" --type SecureString --description "url access for default data source" --profile your-profile
Run it locally:
export AWS_PROFILE=your-profile
#sqlizer --connection-url="root:some_secret_pass@some-cluster.redshift.amazonaws.com:5439/database" --bucket="s3://sqlizer-workflows"
sqlizer