faker-pyspark

faker-pyspark is a PySpark DataFrame and Schema provider for the Faker python package


Keywords
Faker, PySpark
License
MIT
Install
pip install faker-pyspark==0.8.0

Documentation

PySpark provider for Faker

Python package CodeQL

faker-pyspark is a PySpark DataFrame and Schema (StructType) provider for the Faker Python package.

Description

faker-pyspark provides PySpark based fake data for testing purposes. The definition of "fake" in this context really means "random," as the data may look real. However, I make no claims about accuracy, so do not use this as real data!

Installation

Install with pip:

pip install faker-pyspark

Add as a provider to your Faker instance:

from faker import Faker
from faker_pyspark import PySparkProvider
fake = Faker()
fake.add_provider(PySparkProvider)

PySpark DataFrame, Schema and more

>>> df           = fake.pyspark_dataframe()
>>> schema       = fake.pyspark_schema()
>>> df_updated   = fake.pyspark_update_dataframe(df)
>>> column_names = fake.pyspark_column_names()
>>> data         = fake.pyspark_data_dict_using_schema(schema)
>>> data         = fake.pyspark_data_dict()

CLI faker

$ faker pyspark_schema       -i faker_pyspark
$ faker pyspark_dataframe    -i faker_pyspark
$ faker pyspark_schema       -i faker_pyspark
$ faker pyspark_column_names -i faker_pyspark
$ faker pyspark_data_dict    -i faker_pyspark