databuilder

a tool for quickly generating dummy data


License
MIT
Install
pip install databuilder==0.0.2

Documentation

DataBuilder

Have you ever needed some dummy data to demonstrate some basic data analysis / machine learning topics?

DataBuilder can save you time by creating customized dummy data sets within minutes.


Installation

pip install databuilder

Quick Example

import databuilder as db

# make a dummy dataset about "our employees"
config = {
    'fields': {
        'empID':        db.ID(),
        'first_name':   db.Name(first_only=True),
        'last_name':    db.Name(last_only=True),
        'department':   db.Group(["Sales", "Acct", "Mktg", "IT"]),
        'salary':       db.NormalDist(50000, 10000),
        'hire_date':    db.Date("1990-01-01", "2020-12-31")
    }
}

# create a Pandas DataFrame with 
# the fields defined in `config`
df = db.create_df(config, n=200)

print(df.head(2))
#
#   Example output:
#         empID first_name last_name department  salary  hire_date
#      0      1      Frank      Ward         IT   69210 2004-05-05
#      1      2    Barbara    George       Mktg   46744 2019-05-20

Complete Usage Guide

Detailed docs on how to use DataBuilder can be found in the docs/ folder of this repo (or click here)