easy-glue

✨ This package helps you use AWS Glue easily.


Keywords
aws, aws-glue, easy, glue
License
MIT
Install
pip install easy-glue==0.0.1

Documentation

Project logo

Easy Glue

Status GitHub Issues GitHub Pull Requests License


This package helps you use AWS Glue easily.

📝 Table of Contents

🧐 About

You can use following functions.

🏁 Getting Started

Installing

  • If you want save as parquet format, install pandas and fastparquet.
pip install easy_glue

Prerequisites

1. (Required) Create Handler

Use this code to create handler.

import easy_glue

bucket_name = "YOUR BUCKET NAME"

# You don't need to use these parameters if your authentication file is in ~/.aws/config.
aws_access_key_id = "YOUR AWS ACCESS KEY ID"
aws_secret_access_key = "YOUR AWS SECRET ACCESS KEY"
region_name = "YOUR AWS REGION"

# You need to create this directory.
jobs_base_dir = "YOUR A PLACE TO STORE JOBS SCRIPTS"

handler = easy_glue.EasyGlue(bucket_name, jobs_base_dir=jobs_base_dir, aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, region_name=region_name)

print(handler)

result:

<easy_glue.EasyGlue object at 0x016EE7F0>

🎈 Usage

Please check Prerequisites before starting Usage.

🌱 deploy

Use this function to deploy job into glue.

Tutorial

  1. Create a directory sample_job in YOUR_JOBS_BASE_DIR.

  2. Create a py file index.py in YOUR_JOBS_BASE_DIR/sample_job.

  3. Write Spark code in YOURJOBS_BASE_DIR/sample_job/index.py.

  4. Deploy sample_job as the code below.

    >>> print(handler.deploy("sample_job"))

    Execution Result:

    {'Name': 'sample_job', 'ResponseMetadata': {'RequestId': 'e436b350-7b36-47f4-b663-df52a058c2cb', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 10 Aug 2020 03:53:56 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '21', 'connection': 'keep-alive', 'x-amzn-requestid': 'e436b350-7b36-47f4-b663-df52a058c2cb'}, 'RetryAttempts': 0}}
  5. You can find deployed job in a glue console.

    https://ap-northeast-2.console.aws.amazon.com/glue/home?2#etl:tab=jobs

Parameters

Returns

  • Create job result: dict

🌱 run_crawler

Use this function to Run Crawler

Parameters

  • (required) crawler_name: str

Returns

  • Start crawler result: dict

🎉 Acknowledgements

  • Title icon made by Freepik.

  • If you have a problem. please make issue.

  • Please help develop this project 😀

  • Thanks for reading 😄