squaad

Helper functions for running queries, ml pipeline, statistical analysis on SQUAAD framework


Keywords
csse, machine-learning, pipeline, software-engineering
License
MPL-2.0
Install
pip install squaad==2.1

Documentation

SQUAAD ANALYSIS FRAMEWORK

Installation

pip install squaad

Releases

  • V2.0 https://github.com/fostiropoulos/squaad/releases/download/v2.0/squaad-2.0.tar.gz

Install from Binary

pip install squaad-2.0.tar.gz

Usage

Creating new database connection

myConnection=db("config.json","cache")
print("Connection Status: %s"%myConnection.testConnection())

Config.json and Cache

  • Config.json follows the following format:
{"pgsql":{"host":"","user":"","passwd":"","db":""} }
  • Cache folder is used to save results of the queries and uses the cache next time you execute a query.

Games-Howell Statistics Test

stats.gamesHowellBinomial({"GROUP1":{True:100, False:3999}, "GROUP2":{True:2999,False:2939}})

Classification Pipeline with KFold Usage

Parameters

  • X Pandas dataframe with set of data. Each column is a feature
  • Y Labels for the set of data.
  • split_columns (Optional) unimplemented, columns to split by. That is columns that can have bias, we take into consideration during splitting
  • kfolds (Optional) number of folds to run.
  • classifiers (Optional) dictionary containing classifiers to use
  • balancers (Optional) the balancers you want to run

Classifiers

Default Classifiers:

  • Nearest Neighbors
  • Linear SVM
  • RBF SVM
  • Gaussian Process
  • Decision Tree
  • Random Forest
  • Neural Net
  • AdaBoost
  • Naive Bayes
  • QDA

Balancers

Default Classifiers:

  • Unbalanced
  • SMOTE
  • SMOTEEN
  • SMOTETomek
  • RandomUnderSampler

ML Pipeline examples

X=df[['locs_inc', 'cplxs_inc', 'smls_inc', 'vuls_inc', 'fbgs_inc', 'locs_dec', 'cplxs_dec', 'smls_dec', 'vuls_dec', 'fbgs_dec']]
Y=df['affiliation']
mlPipeline.classificationPipeLineKfold(X,Y)