gbqschema-converter

Library to convert Google BigQuery Table Schema into Json Schema


Keywords
bigquery-schema, google-bigquery, google-cloud-platform, jsonschema, python3
License
MIT
Install
pip install gbqschema-converter==1.2.1

Documentation

Google BigQuery Table Schema Converter

license pyversion coverage test

Python library to convert Google BigQuery table schema into draft-07 json schema and vice versa.

The library includes two main modules:

gbqschema_converter
├── gbqschema_to_jsonschema.py
└── jsonschema_to_gbqschema.py

Each of those modules has two main functions:

  • json_representation: corresponds to json output (input for gbqschema_to_jsonschema).
  • sdk_representation: corresponds to Google Python SDK format output (input for gbqschema_to_jsonschema).

Installation

python3 -m venv env && source ${PWD}/env/bin/activate
(env) pip install --no-cache-dir gbqschema_converter

Usage: CLI

Convert json-schema to GBQ table schema

(env) json2gbq -h
usage: Google BigQuery table schema converter. [-h] (-i INPUT | -f FILE)

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input object as string.
  -f FILE, --file FILE  Input object as file path.

Example: stdin

Execution:

(env) json2gbq -i '{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element"
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
          "description": "Att 2"
        },
        "att_03": {
          "type": "string"
        },
        "att_04": {
          "type": "boolean"
        },
        "att_05": {
          "type": "string",
          "format": "date"
        },
        "att_06": {
          "type": "string",
          "format": "date-time"
        },
        "att_07": {
          "type": "string",
          "format": "time"
        }
      },
      "required": [
        "att_02"
      ]
    }
  }
}'

Output:

2020-04-08 21:42:51.700 [INFO ] [GBQ Table Schema Converter] Output (5.52 ms elapsed):
[
  {
    "description": "Att 1",
    "name": "att_01",
    "type": "INTEGER",
    "mode": "NULLABLE"
  },
  {
    "description": "Att 2",
    "name": "att_02",
    "type": "NUMERIC",
    "mode": "REQUIRED"
  },
  {
    "name": "att_03",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "att_04",
    "type": "BOOLEAN",
    "mode": "NULLABLE"
  },
  {
    "name": "att_05",
    "type": "DATE",
    "mode": "NULLABLE"
  },
  {
    "name": "att_06",
    "type": "DATETIME",
    "mode": "NULLABLE"
  },
  {
    "name": "att_07",
    "type": "TIMESTAMP",
    "mode": "NULLABLE"
  }
]

Example: file

Execution:

(env) json2gbq -f ${PWD}/data/jsonschema.json

Output:

2020-04-08 21:57:25.516 [INFO ] [GBQ Table Schema Converter] Output (6.39 ms elapsed):
[
  {
    "description": "Att 1",
    "name": "att_01",
    "type": "INTEGER",
    "mode": "NULLABLE"
  },
  {
    "description": "Att 2",
    "name": "att_02",
    "type": "NUMERIC",
    "mode": "REQUIRED"
  },
  {
    "name": "att_03",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "att_04",
    "type": "BOOLEAN",
    "mode": "NULLABLE"
  },
  {
    "name": "att_05",
    "type": "DATE",
    "mode": "NULLABLE"
  },
  {
    "name": "att_06",
    "type": "DATETIME",
    "mode": "NULLABLE"
  },
  {
    "name": "att_07",
    "type": "TIMESTAMP",
    "mode": "NULLABLE"
  }
]

Convert GBQ table schema to json-schema

(env) gbq2json -h
usage: Google BigQuery table schema converter. [-h] (-i INPUT | -f FILE)

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input object as string.
  -f FILE, --file FILE  Input object as file path.

Example: stdin

Execution:

(env) gbq2json -i '[
  {
    "description": "Att 1",
    "name": "att_01",
    "type": "INTEGER",
    "mode": "NULLABLE"
  },
  {
    "description": "Att 2",
    "name": "att_02",
    "type": "NUMERIC",
    "mode": "REQUIRED"
  },
  {
    "name": "att_03",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "att_04",
    "type": "BOOLEAN",
    "mode": "NULLABLE"
  },
  {
    "name": "att_05",
    "type": "DATE",
    "mode": "NULLABLE"
  },
  {
    "name": "att_06",
    "type": "DATETIME",
    "mode": "NULLABLE"
  },
  {
    "name": "att_07",
    "type": "TIMESTAMP",
    "mode": "NULLABLE"
  }
]'

Output:

2020-04-08 21:51:05.370 [INFO ] [GBQ Table Schema Converter] Output (1.08 ms elapsed):
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element"
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
          "description": "Att 2"
        },
        "att_03": {
          "type": "string"
        },
        "att_04": {
          "type": "boolean"
        },
        "att_05": {
          "type": "string",
          "format": "date"
        },
        "att_06": {
          "type": "string",
          "format": "date-time"
        },
        "att_07": {
          "type": "string",
          "format": "time"
        }
      },
      "additionalProperties": false,
      "required": [
        "att_02"
      ]
    }
  }
}

Example: file

Execution:

(env) gbq2json -f ${PWD}/data/gbqschema.json

Output:

2020-04-08 21:55:20.275 [INFO ] [GBQ Table Schema Converter] Output (1.72 ms elapsed):
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element"
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
          "description": "Att 2"
        },
        "att_03": {
          "type": "string"
        },
        "att_04": {
          "type": "boolean"
        },
        "att_05": {
          "type": "string",
          "format": "date"
        },
        "att_06": {
          "type": "string",
          "format": "date-time"
        },
        "att_07": {
          "type": "string",
          "format": "time"
        }
      },
      "additionalProperties": false,
      "required": [
        "att_02"
      ]
    }
  }
}

Usage: python program

Convert json-schema to GBQ table schema

Example: output as json

from gbqschema_converter.jsonschema_to_gbqschema import json_representation as converter

schema_in = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element",
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
        },
      }
      "required": [
        "att_02",
      ],
    },
  },
}

schema_out = converter(schema_in)
print(schema_out)

Output:

[{'description': 'Att 1', 'name': 'att_01', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'att_02', 'type': 'NUMERIC', 'mode': 'REQUIRED'}]

Example: output as list of SchemaField (SDK format)

from gbqschema_converter.jsonschema_to_gbqschema import sdk_representation as converter

schema_in = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element",
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
        },
      },
      "required": [
        "att_02",
      ],
    },
  },
}

schema_out = converter(schema_in)
print(schema_out)

Output:

[SchemaField('att_01', 'INTEGER', 'NULLABLE', 'Att 1', ()), SchemaField('att_02', 'NUMERIC', 'REQUIRED', None, ())]

Convert GBQ table schema to json-schema

Example: output as json

from gbqschema_converter.gbqschema_to_jsonschema import json_representation as converter

schema_in = [
    {
        'description': 'Att 1',
        'name': 'att_01',
        'type': 'INTEGER',
        'mode': 'NULLABLE'
    },
    {
        'name': 'att_02',
        'type': 'NUMERIC',
        'mode': 'REQUIRED'
    }
]

schema_out = converter(schema_in)
print(schema_out)

Output:

{'$schema': 'http://json-schema.org/draft-07/schema#', 'type': 'array', 'items': {'$ref': '#/definitions/element'}, 'definitions': {'element': {'type': 'object', 'properties': {'att_01': {
'type': 'integer', 'description': 'Att 1'}, 'att_02': {'type': 'number'}}, 'additionalProperties': False, 'required': ['att_02']}}}

Example: output as list of SchemaField (SDK format)

from gbqschema_converter.gbqschema_to_jsonschema import sdk_representation as converter
from google.cloud.bigquery import SchemaField

schema_in = [
    SchemaField('att_01', 'INTEGER', 'NULLABLE', 'Att 1', ()),
    SchemaField('att_02', 'NUMERIC', 'REQUIRED', None, ()),
]

schema_out = converter(schema_in)
print(schema_out)

Output:

{'$schema': 'http://json-schema.org/draft-07/schema#', 'type': 'array', 'items': {'$ref': '#/definitions/element'}, 'definitions': {'element': {'type': 'object', 'properties': {'att_01': {
'type': 'integer', 'description': 'Att 1'}, 'att_02': {'type': 'number'}}, 'additionalProperties': False, 'required': ['att_02']}}}