GiantPandas
Convenient functions and connectors for Pandas.
Install with pip
$ pip install GiantPandas
Usage
- Import the library.
from GiantPandas import PandasOps, ExcelConnector, PsqlConnector, S3Connector
- Each of the imported submodules has several functions. Please refer to respective help for more information.
PandasOps
Methods:
-
PandasOps.get_row_count(dataframe)
: get row count of a dataframe -
PandasOps.get_dict_from_two_columns(dataframe, key_column, value_column, keep_duplicate_keys)
: get dictionary from two dataframe columns -
PandasOps.get_dataframe_with_all_permutations_from_dict(dict_with_list_values)
: create dataframe with all possible permutations from dict with values of type list -
PandasOps.set_column_as_index(dataframe, column_name, drop_original_column)
: set column as an index -
PandasOps.get_dict_of_column_name_to_type(dataframe)
: get dict of column name to their dtype -
PandasOps.get_column_names_by_type(dataframe, column_dtype)
: get all columns of desired dtype -
PandasOps.contains_all_integer_in_float_column(dataframe, column_name)
: check if all non-nan values in float columns are int -
PandasOps.set_column_names_to_alpha_numeric(dataframe)
: convert column name to alpha numeric -
PandasOps.set_column_names_to_snake_case(dataframe)
: convert column name to snake case -
PandasOps.exists_unnamed_headers(dataframe)
: check if a dataframe contains any unnamed headers -
PandasOps.exists_column(dataframe, column_name_list)
: check if a dataframe contains desired column -
PandasOps.get_maximum_length_of_dtype_object_values(dataframe, column_name)
: get maximum length of object in a column
ExcelConnector
Methods:
-
ExcelConnector.get_sheet_names(file)
: get all sheet names -
ExcelConnector.get_dataframe_from_excel(file, sheet_name, skip_rows_list)
: read excel sheet into a dataframe -
ExcelConnector.send_dataframe_to_excel(file, dataframe_to_sheet_name_tuple_list, write_index)
: write dataframe to an excel sheet
PsqlConnector
First, an instance must be created for establishing connection.
psql_connector = PsqlConnector(
host="localhost",
dbname="postgres",
username="postgres",
password="##########",
port="5432",
)
Methods:
-
psql_connector.get_query_results(query)
: get results of a psql query as a dataframe -
psql_connector.upload_dataframe(dataframe, schema_name, table_name, if_exists)
: upload dataframe to psql
S3Connector
AWS credentials are fetched by boto3. Desired credentials can be passed while initializing S3Connector
. In order to configure client connection, pass a botocore.config.Config
object.
s3_connector = S3Connector(
aws_access_key_id="############",
aws_secret_access_key="############",
aws_region="############",
config=botocore.config.Config(
connect_timeout=60,
read_timeout=60,
),
)
Methods:
-
s3_connector.upload_dataframe_as_csv(dataframe, bucket, object_key, csv_sep, csv_null_identifier, include_header, include_index, encoding)
: upload pandas dataframe as a csv file into S3 bucket
Demo
Demo script for saving results of PostgreSQL query into an excel file.
$ python3 demo/Psql2Excel.py -f table_from_psql.xlsx -H localhost -d postgres -u postgres -t test_table -sn Sheet1
Demo script for uploading a table from excel sheet into a PSQL database.
$ python3 demo/Excel2Psql.py -f tests/table_to_psql.xlsx -H localhost -d postgres -u postgres -t test_table -ie replace
© 2020, Samyak Ratna Tamrakar.