Databricks Deployment Utils


License
MIT
Install
pip install autobricks==0.9.0

Documentation

Documentation

https://autobricks.readthedocs.io/en/latest/

Development Setup

Create virual environment and install dependencies for local development:

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Configuration & Authentication

Azure databricks allows the following authentication methods:

This library supports all by simply setting the following environment variables as required for each method. The AUTH_TYPE sets the authorisation mode and therefore what configuration to expect. Ensure that sensitive values are managed using secret redaction e.g. key vault or some other method.

Variable User PAT SP SP on Mgmt Endpoint
AUTH_TYPE USER SERVICE_PRINCIPAL SERVICE_PRINCIPAL_MGMT_ENDPOINT
DBUTILSTOKEN ✓ ✓ ✓
TENANT_ID ✓ ✓
SP_CLIENT_ID ✓ ✓
SP_CLIENT_SECRET ✓ ✓
AD_RESOURCE ✓ ✓
MGMT_RESOURCE_ENDPOINT ✓
WORKSPACE_NAME ✓
RESOURCE_GROUP ✓
SUBSCRIPTION_ID ✓
DATABRICKS_API_HOST ✓ ✓ ✓

Also note that library will use simple rest calls to retrieve tokens. An alternative approach is to leverage the adal library to authenticate. If you wish to authenticate over adal please refer to following settings:

Variable SP SP on Mgmt Endpoint
AUTH_TYPE SERVICE_PRINCIPAL_ADAL SERVICE_PRINCIPAL_MGMT_ENDPOINT_ADAL
DBUTILSTOKEN ✓ ✓
TENANT_ID ✓ ✓
SP_CLIENT_ID ✓ ✓
SP_CLIENT_SECRET ✓ ✓
AD_RESOURCE ✓ ✓
MGMT_RESOURCE_ENDPOINT ✓
WORKSPACE_NAME ✓
RESOURCE_GROUP ✓
SUBSCRIPTION_ID ✓
DATABRICKS_API_HOST ✓ ✓

When deciding between adal and rest calls the following is relevant:

  • adal may not work in Azure DevOps in highly restricted private network setups
  • Rest calls are more simple however if MS makes any changes to the API's the adal isn't there to abstract those changes.

The following variables will default.

Variable Default
AUTH_TYPE SERVICE_PRINCIPAL
AD_RESOURCE 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
MGMT_RESOURCE_ENDPOINT https://management.core.windows.net/

For testing, development and deployment and deployment substitute your own values between the angled brackets:

export AUTH_TYPE=USER
# export AUTH_TYPE=SERVICE_PRINCIPAL
# export AUTH_TYPE=SERVICE_PRINCIPAL_MGMT_ENDPOINT
# export AUTH_TYPE=SERVICE_PRINCIPAL_ADAL
# export AUTH_TYPE=SERVICE_PRINCIPAL_MGMT_ENDPOINT_ADAL

# required for AUTH_TYPE=USER
export DBUTILSTOKEN="<my_token>"

# required for AUTH_TYPE=SERVICE_PRINCIPAL
export TENANT_ID="<my_tenant_id>"
export SP_CLIENT_ID=<my_service_principal_client_id>
export SP_CLIENT_SECRET=<my_service_principal_secret>
export AD_RESOURCE=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

# required for AUTH_TYPE in (SERVICE_PRINCIPAL or SERVICE_PRINCIPAL_MGMT_ENDPOINT)
export MGMT_RESOURCE_ENDPOINT=https://management.core.windows.net/
export WORKSPACE_NAME=my_dbx_workspacename
export RESOURCE_GROUP=my_dbx_resourcegroup
export SUBSCRIPTION_ID=<my_subscription_id>
export DATABRICKS_API_HOST=<my_databricks_host_url>

Exporting variables doesn't make for a great development experience so the project uses pytest.ini for mock environment variables. For example:

[pytest]
env =
    DATABRICKS_API_HOST=https://<my_databricks_host>.azuredatabricks.net
    DBUTILSTOKEN=<my_token>
    ...

REMINDER: do NOT commit pytest.ini that contains real security tokens

For convenience here is a yaml template for docker and build pipelines:

env:
    # AUTH_TYPE: USER
    AUTH_TYPE: SERVICE_PRINCIPAL
    # AUTH_TYPE: SERVICE_PRINCIPAL_MGMT_ENDPOINT
    # AUTH_TYPE: SERVICE_PRINCIPAL_ADAL
    # AUTH_TYPE: SERVICE_PRINCIPAL_MGMT_ENDPOINT_ADAL

    # required for AUTH_TYPE=USER
    # DBUTILSTOKEN: $(token)

    # required for AUTH_TYPE=SERVICE_PRINCIPAL
    TENANT_ID: $(tenant_id)
    SP_CLIENT_ID: $(service_principal_client_id)
    SP_CLIENT_SECRET: $(service_principal_secret)

    # required for AUTH_TYPE in (SERVICE_PRINCIPAL or SERVICE_PRINCIPAL_MGMT_ENDPOINT)
    WORKSPACE_NAME: $(dbx_workspace_name)
    RESOURCE_GROUP: $(dbx_resource_group)
    SUBSCRIPTION_ID: $(subscription_id)
    DATABRICKS_API_HOST: $(databricks_host_url)

Build

Build python wheel:

python setup.py sdist bdist_wheel

There is a CI build configured for this repo that builds on main origin and publishes to PyPi.

Test

Dependencies for testing:

pip install --editable .

Run tests:

pytest

Test Coverage:

pytest --cov=autobricks --cov-report=html

View the report in a browser:

./htmlcov/index.html