forecastflowml

Scalable machine learning forecasting framework with Pyspark


Keywords
forecasting, lightgbm, machine-learning, python, spark, time-series, xgboost
License
MIT
Install
pip install forecastflowml==0.0.2

Documentation

ForecastFlowML: Scalable Machine Learning Forecasting with PySpark

Python Versions Tests codecov Documentation Status

ForecastFlowML is a scalable machine learning forecasting framework that enables parallel training (by distributing models rather than data) of scikit-learn like models based on PySpark.

With ForecastFlowML, you can build scikit-learn like regressors as direct multi-step forecasters, and train a seperate model for each group in your dataset. Our package leverages the power of PySpark to efficiently handle large datasets and enables distributed computing for faster model training.

Features

ForecastFlowML provides a range of features that make it a powerful and flexible tool for time-series forecasting, including:

  • Works with Pandas and Pyspark DataFrames.
  • Distributed model training per group in the dataframe.
  • Direct multi-step forecasting.
  • Built-in time based cross-validation.
  • Extensive time-series feature engineering (lag, rolling mean/std, stockout, history length).
  • Hyperparameter tuning for each group model with grid search.
  • Supports scikit-learn like libraries such as LightGBM or XGBoost.

Whether you're new to time-series forecasting or an experienced data scientist, ForecastFlowML can help you build and deploy accurate forecasting models at scale.

Documentation

Reach to our latest documentation here.

Get Started

What is ForecastFlowML?

Quick Start

User Guide

Feature Engineering

Time Series Cross Validation

Grid Search

Feature Importance

Save/Load ForecastFlowML

Benchmarks

Kaggle Walmart M5 Forecasting Competition

  • Ranks as 18th solution in late submission with minimal effort.

Installation

ForecastFlowML installation

You can install the package using the following command:

pip install forecastflowml

Check Java

Make sure you have installed Java 11. You can check whether you have Java or not with the following command:

java -version

Set PYSPARK_PYTHON

In the python script, set PYSPARK_PYTHON environment variable to your Python executable path before creating the spark instance:

import sys
import os
from pyspark.sql import SparkSession
os.environ["PYSPARK_PYTHON"] = sys.executable
spark = SparkSession.builder.master("local[*]").getOrCreate()