dataframe-literal

A library for turning stings into dataframes


License
MIT
Install
pip install dataframe-literal==0.1.3

Documentation

DataFrame Literal

Build Status

A library for turning stings into dataframes which supports both Pandas and PySpark.

To create a dataframe a header is required which contains both the column names and column types on the first row. All the following rows will be taken as data rows.

Install

Basic

pip install dataframe_literal

Extras:

Also installs some extra libraries, you can pick if you are using PySpark or Pandas.

pip install dataframe_literal[all]
pip install dataframe_literal[pyspark]
pip install dataframe_literal[pandas]

PySpark

Getting Started

The simplest way to create a DataFrame is by doing:

from dataframe_literal.spark import dataframe

df = dataframe(
    """
    | a (str) | b (int) | c (bool) | d (date)   | e (timestamp)       |
    | aaa     | 123     | True     | 2019-10-10 | 2019-10-20 10:11:12 |
    | aaa     | 123     | False    | 2019-10-10 | 2019-10-20 10:11:12 |
    """
)
df.printSchema()
df.show()

This will use an existing Spark Session or make a new one and then construct a DataFrame with the following schema:

root
 |-- a: string (nullable = true)
 |-- b: integer (nullable = true)
 |-- c: boolean (nullable = true)
 |-- d: date (nullable = true)
 |-- e: timestamp (nullable = true)

Supported datatypes:

  • int: T.IntegerType
  • integer: T.IntegerType
  • str: T.StringType
  • string: T.StringType
  • bool: T.BooleanType
  • boolean: T.BooleanType
  • date: T.DateType
  • timestamp: T.TimestampType

Advanced Usage

You can also pass in your own SparkSession using:

from pyspark.sql import SparkSession
from dataframe_literal.spark import dataframe

spark = SparkSession.builder.getOrCreate()
dataframe(
    ...
    spark=spark
)

We also have the ability to create nested PySpark DataFrames such as

from pyspark.sql import SparkSession
from dataframe_literal.spark import dataframe

spark = SparkSession.builder.getOrCreate()
dataframe(
    data="""
    | a.col1 (str) | a.col2 (str) | b.col1 (str) | c.col1 (str) | d (str) |
    | aaa          | bbb          | ccc          | ddd          | eee        |
    | aaa          | bbb          | ccc          | ddd          | eee        |
    """,
    spark=spark
)

This will construct a DataFrame with the following schema:

root
 |-- a: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |    |-- col2: string (nullable = true)
 |-- b: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |-- c: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |-- d: string (nullable = true)

Pandas

Coming soon.