DataFrame Literal

A library for turning stings into dataframes which supports both Pandas and PySpark.

To create a dataframe a header is required which contains both the column names and column types on the first row. All the following rows will be taken as data rows.

Install

Basic

pip install dataframe_literal

Extras:

Also installs some extra libraries, you can pick if you are using PySpark or Pandas.

pip install dataframe_literal[all]
pip install dataframe_literal[pyspark]
pip install dataframe_literal[pandas]

PySpark

Getting Started

The simplest way to create a DataFrame is by doing:

from dataframe_literal.spark import dataframe

df = dataframe(
    """
    | a (str) | b (int) | c (bool) | d (date)   | e (timestamp)       |
    | aaa     | 123     | True     | 2019-10-10 | 2019-10-20 10:11:12 |
    | aaa     | 123     | False    | 2019-10-10 | 2019-10-20 10:11:12 |
    """
)
df.printSchema()
df.show()

This will use an existing Spark Session or make a new one and then construct a DataFrame with the following schema:

root
 |-- a: string (nullable = true)
 |-- b: integer (nullable = true)
 |-- c: boolean (nullable = true)
 |-- d: date (nullable = true)
 |-- e: timestamp (nullable = true)

Supported datatypes:

int: T.IntegerType
integer: T.IntegerType
str: T.StringType
string: T.StringType
bool: T.BooleanType
boolean: T.BooleanType
date: T.DateType
timestamp: T.TimestampType

Advanced Usage

You can also pass in your own SparkSession using:

from pyspark.sql import SparkSession
from dataframe_literal.spark import dataframe

spark = SparkSession.builder.getOrCreate()
dataframe(
    ...
    spark=spark
)

We also have the ability to create nested PySpark DataFrames such as

from pyspark.sql import SparkSession
from dataframe_literal.spark import dataframe

spark = SparkSession.builder.getOrCreate()
dataframe(
    data="""
    | a.col1 (str) | a.col2 (str) | b.col1 (str) | c.col1 (str) | d (str) |
    | aaa          | bbb          | ccc          | ddd          | eee        |
    | aaa          | bbb          | ccc          | ddd          | eee        |
    """,
    spark=spark
)

This will construct a DataFrame with the following schema:

root
 |-- a: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |    |-- col2: string (nullable = true)
 |-- b: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |-- c: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |-- d: string (nullable = true)

Pandas

Coming soon.

dataframe-literal
Release 0.1.3

Release 0.1.3

0.1.3

0.1.2

0.1.1

0.1.0

Documentation

DataFrame Literal

Install

PySpark

Getting Started

Supported datatypes:

Advanced Usage

Pandas

Stats

Development practices

Releases

Contributors

dataframe-literal Release 0.1.3

Release 0.1.3 Toggle Dropdown 0.1.3 0.1.2 0.1.1 0.1.0

Documentation

DataFrame Literal

Install

PySpark

Getting Started

Supported datatypes:

Advanced Usage

Pandas

Stats

Development practices

Releases

Contributors

dataframe-literal
Release 0.1.3

Release 0.1.3

0.1.3

0.1.2

0.1.1

0.1.0