datacatalog-tag-manager

A package to manage Google Cloud Data Catalog tags, loading metadata from external sources


Keywords
bigdata, csv-import, data-governance, datacatalog, gcp, gcp-datacatalog, google-cloud, python
License
MIT
Install
pip install datacatalog-tag-manager==2.2.0

Documentation

datacatalog-tag-manager

A Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources. Currently supports the CSV file format.

Continuous Integration Continuous Delivery

Table of Contents


1. Environment setup

1.1. Python + virtualenv

Using virtualenv is optional, but strongly recommended unless you use Docker.

1.1.1. Install Python 3.6+

1.1.2. Create a folder

This is recommended so all related stuff will reside at same place, making it easier to follow below instructions.

mkdir ./datacatalog-tag-manager
cd ./datacatalog-tag-manager

All paths starting with ./ in the next steps are relative to the datacatalog-tag-manager folder.

1.1.3. Create and activate an isolated Python environment

pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate

1.1.4. Install the package

pip install --upgrade datacatalog-tag-manager

1.2. Docker

Docker may be used as an alternative to run datacatalog-tag-manager. In this case, please disregard the above virtualenv setup instructions.

1.2.1. Get the source code

git clone https://github.com/ricardolsmendes/datacatalog-tag-manager
cd ./datacatalog-tag-manager

1.3. Auth credentials

1.3.1. Create a service account and grant it below roles

  • BigQuery Metadata Viewer
  • Data Catalog TagTemplate User
  • A custom role with bigquery.datasets.updateTag and bigquery.tables.updateTag permissions

1.3.2. Download a JSON key and save it as

  • ./credentials/datacatalog-tag-manager.json

1.3.3. Set the environment variables

This step may be skipped if you're using Docker.

export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-tag-manager.json

2. Manage Tags

2.1. Create or Update

2.1.1. From a CSV file

  • SCHEMA

The metadata schema to create or update Tags is presented below. Use as many lines as needed to describe all the Tags and Fields you need.

Column Description Mandatory
linked_resource OR entry_name Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries yes
template_name Resource name of the Tag Template for the Tag yes
column Attach Tags to a column belonging to the Entry schema no
field_id Id of the Tag field yes
field_value Value of the Tag field yes
  • SAMPLE INPUT
  1. sample-input/upsert-tags for reference;
  2. Data Catalog Sample Tags (Google Sheets) might help to create/export a CSV file.
  • COMMANDS

Python + virtualenv

datacatalog-tags upsert --csv-file <CSV-FILE-PATH>

Docker

docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
  --volume <CREDENTIALS-FILE-FOLDER>:/credentials --volume <CSV-FILE-FOLDER>:/data \
  datacatalog-tag-manager upsert --csv-file /data/<CSV-FILE-PATH>

2.2. Delete

2.2.1. From a CSV file

  • SCHEMA

The metadata schema to delete Tags is presented below. Use as many lines as needed to delete all the Tags you want.

Column Description Mandatory
linked_resource OR entry_name Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries yes
template_name Resource name of the Tag Template of the Tag yes
column Delete Tags from a column belonging to the Entry schema no
  • SAMPLE INPUT
  1. sample-input/delete-tags for reference;
  2. Data Catalog Sample Tags (Google Sheets) might help to create/export a CSV file.
  • COMMANDS

Python + virtualenv

datacatalog-tags delete --csv-file <CSV-FILE-PATH>

Docker

docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
  --volume <CREDENTIALS-FILE-FOLDER>:/credentials --volume <CSV-FILE-FOLDER>:/data \
  datacatalog-tag-manager delete --csv-file /data/<CSV-FILE-PATH>