DataFrame Literal
A library for turning stings into dataframes which supports both Pandas and PySpark.
To create a dataframe a header is required which contains both the column names and column types on the first row. All the following rows will be taken as data rows.
Install
Basic
pip install dataframe_literal
Extras:
Also installs some extra libraries, you can pick if you are using PySpark or Pandas.
pip install dataframe_literal[all]
pip install dataframe_literal[pyspark]
pip install dataframe_literal[pandas]
PySpark
Getting Started
The simplest way to create a DataFrame is by doing:
from dataframe_literal.spark import dataframe
df = dataframe(
"""
| a (str) | b (int) | c (bool) | d (date) | e (timestamp) |
| aaa | 123 | True | 2019-10-10 | 2019-10-20 10:11:12 |
| aaa | 123 | False | 2019-10-10 | 2019-10-20 10:11:12 |
"""
)
df.printSchema()
df.show()
This will use an existing Spark Session or make a new one and then construct a DataFrame with the following schema:
root
|-- a: string (nullable = true)
|-- b: integer (nullable = true)
|-- c: boolean (nullable = true)
|-- d: date (nullable = true)
|-- e: timestamp (nullable = true)
Supported datatypes:
- int: T.IntegerType
- integer: T.IntegerType
- str: T.StringType
- string: T.StringType
- bool: T.BooleanType
- boolean: T.BooleanType
- date: T.DateType
- timestamp: T.TimestampType
Advanced Usage
You can also pass in your own SparkSession using:
from pyspark.sql import SparkSession
from dataframe_literal.spark import dataframe
spark = SparkSession.builder.getOrCreate()
dataframe(
...
spark=spark
)
We also have the ability to create nested PySpark DataFrames such as
from pyspark.sql import SparkSession
from dataframe_literal.spark import dataframe
spark = SparkSession.builder.getOrCreate()
dataframe(
data="""
| a.col1 (str) | a.col2 (str) | b.col1 (str) | c.col1 (str) | d (str) |
| aaa | bbb | ccc | ddd | eee |
| aaa | bbb | ccc | ddd | eee |
""",
spark=spark
)
This will construct a DataFrame with the following schema:
root
|-- a: struct (nullable = true)
| |-- col1: string (nullable = true)
| |-- col2: string (nullable = true)
|-- b: struct (nullable = true)
| |-- col1: string (nullable = true)
|-- c: struct (nullable = true)
| |-- col1: string (nullable = true)
|-- d: string (nullable = true)
Pandas
Coming soon.