easy-glue 0.0.1 on PyPI

Easy Glue

This package helps you use AWS Glue easily.

📝 Table of Contents

About
Getting Started
Usage
Acknowledgments

🧐 About

You can use following functions.

deploy
run_crawler

🏁 Getting Started

Installing

If you want save as parquet format, install pandas and fastparquet.

pip install easy_glue

Prerequisites

1. (Required) Create Handler

Use this code to create handler.

import easy_glue

bucket_name = "YOUR BUCKET NAME"

# You don't need to use these parameters if your authentication file is in ~/.aws/config.
aws_access_key_id = "YOUR AWS ACCESS KEY ID"
aws_secret_access_key = "YOUR AWS SECRET ACCESS KEY"
region_name = "YOUR AWS REGION"

# You need to create this directory.
jobs_base_dir = "YOUR A PLACE TO STORE JOBS SCRIPTS"

handler = easy_glue.EasyGlue(bucket_name, jobs_base_dir=jobs_base_dir, aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, region_name=region_name)

print(handler)

result:

<easy_glue.EasyGlue object at 0x016EE7F0>

🎈 Usage

Please check Prerequisites before starting Usage.

🌱 deploy

Use this function to deploy job into glue.

Tutorial

Create a directory sample_job in YOUR_JOBS_BASE_DIR.
Create a py file index.py in YOUR_JOBS_BASE_DIR/sample_job.
Write Spark code in YOURJOBS_BASE_DIR/sample_job/index.py.

Deploy sample_job as the code below.

>>> print(handler.deploy("sample_job"))

Execution Result:

{'Name': 'sample_job', 'ResponseMetadata': {'RequestId': 'e436b350-7b36-47f4-b663-df52a058c2cb', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 10 Aug 2020 03:53:56 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '21', 'connection': 'keep-alive', 'x-amzn-requestid': 'e436b350-7b36-47f4-b663-df52a058c2cb'}, 'RetryAttempts': 0}}

You can find deployed job in a glue console.

https://ap-northeast-2.console.aws.amazon.com/glue/home?2#etl:tab=jobs

Parameters

(required) job_name: str

Name of glue job to be deployed.
max_capacity: int (default = 3)

Max Capactiy of Glue Workers
timeout: int (default = 7200)

Timeout of glue job
default_arguments: dict (default = {})

Default Arguments of glue job. Detail refer to below.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Returns

Create job result: dict

🌱 run_crawler

Use this function to Run Crawler

Parameters

(required) crawler_name: str

Returns

Start crawler result: dict

🎉 Acknowledgements

Title icon made by Freepik.
If you have a problem. please make issue.
Please help develop this project 😀
Thanks for reading 😄

easy-glue
Release 0.0.1

Release 0.0.1

0.0.1

Documentation

Easy Glue

📝 Table of Contents

🧐 About

🏁 Getting Started

Installing

Prerequisites

1. (Required) Create Handler

🎈 Usage

🌱 deploy

🌱 run_crawler

🎉 Acknowledgements

Stats

Development practices

Releases

Contributors

easy-glue Release 0.0.1

Release 0.0.1 Toggle Dropdown 0.0.1

Documentation

Easy Glue

📝 Table of Contents

🧐 About

🏁 Getting Started

Installing

Prerequisites

1. (Required) Create Handler

🎈 Usage

🌱 deploy

🌱 run_crawler

🎉 Acknowledgements

Stats

Development practices

Releases

Contributors

easy-glue
Release 0.0.1

Release 0.0.1

0.0.1