colda

Collaborative Data Analysis for All


Keywords
colda, collaborative, data, analysis, science, machine, learning, assisted-learning, collaborative-learning, data-analysis, deep-learning, distributed-machine-learning, machine-learning, supervised-learning
License
Apache-2.0
Install
pip install colda==0.0.1

Documentation



Collaborative Data Analysis for All

Introduction

ColDA is an open source project aimed at providing distributed machine learning tools for data analysis and machine learning based on Assisted Learning.

Features

  • Algorithm
  • Frontend
  • Backend
  • Package

Algorithm

The project uses Gradient Assisted Learning as the fundamental algorithm for collaboratively training distributed models.

Get started

  • Use data/make_dataset.py to split csv files
  • Use command in run_[dataset]_[number_of_sponsor]s_[number of assistor]a.sh to run experiments

Instructions

  • files ends with _exe.py are local operations
  • baseline.py produces baseline results on joint datasets
  • make_train_local.py produces baseline results on joint datasets
  • make_hash.py uses sha256 to encode identification for alignment
  • save_match_id.py saves hash results
  • make_match_idx.py match identification with hash results
  • make_residual.py computes residuals
  • save_residual.py saves residuals
  • make_train.py locally fits the residuals
  • save_output.py saves outputs of trained models
  • make_result.py produces aggregated results
  • make_test.py produces inference results
  • make_eval.py evaluates inference results

PyInstaller

conda create --name myenv python --no-default-packages  
conda activate myenv  
pip install pyinstaller  
pip install numpy  
pip install -U scikit-learn  
cd algorithm
pyinstaller run.spec  # To one folder
pyinstaller -F run.py  # To one folder

Frontend

Get started

Run the following command to launch the software for the first time:

sudo apt install npm

# update node
sudo npm cache clean -f
dudo npm install -g n
sudo n stable
PATH = "$PATH"

sudo snap install vue

npm install

npm run electron:serve

./node_modules/.bin/electron-rebuild # If there is bug on windows: .\node_modules\.bin\electron-rebuild

Run the following command to launch the software after first time:

npm install

npm run electron:serve

Run the following command to package the software:

npm install

npm run electron:build

Run the following command to run unittest:

npm run test

Instructions

  • Navbar.vue presents the software navigation bar, and the communication between the software and the backend is mainly completed by the functions in this file
  • assets folder contains image, font, css resources used in the software
  • components folder contains reusable interface components
  • network folder contains request sending and interception configuration
  • router folder conatins routing configuration file
  • store folder is used for storing some local information
  • Notifications folder contains functions that handle notifications and history
  • Auth folder contains functions that handle user registration and login
  • Settings folder contains functions that handle user customized settings
  • tests folder contains unittest function

Backend

Getting Started

  • launch procedures

    • export FLASK_APP=application.py (first time you clone the github)
    • pipenv install
    • pipenv shell
    • flask run
  • Unittest:

    • flask test (test all files, use this command in top file level)
    • notes: You could switch the test framework to pytest, which is more convenient
    • notes: tests/test_unread_test_output.py contains most the logic for your reference
  • Deploy:

    • Install some dependencies first follow this
    • heroku login (Use username and pwd in google drive key file)
    • git add .
    • git commit -m 'Commit_Name'
    • git push
    • git push heroku Current_branch_name
    • heroku open (view our app)

Package

Getting Started

Use case

  • Examples and Instructions can be found in examples/

Package Stucture

  • Basic package structure can be found in Github repository

  • Compared to the Basic package structure, docs/ will contain different element. But at this point, you can follow the template

  • py-pkg is the main part of the package, you can add more modules (with __init__.py) in this part. For example, if you add temp module, you can import temp module by:

import temp from py-pkg
  • This package structure can be improved by learning PyTorch package structure.

  • Basic Structure:

py-package-tempate/
 |-- docs/
 |-- |-- build_html/
 |-- |-- build_latex/
 |-- |-- source/
 |-- py-pkg/
 |-- |-- __init__.py
 |-- |-- __version__.py
 |-- |-- curves.py
 |-- |-- entry_points.py
 |-- tests/
 |-- |-- test_data/
 |-- |   |-- supply_demand_data.json
 |-- |   __init__.py
 |-- |   conftest.py
 |-- |   test_curves.py
 |-- .env
 |-- .gitignore
 |-- Pipfile
 |-- Pipfile.lock
 |-- README.md
 |-- setup.py

How to Manage Package Environment

  • pipenv is used to manage package. You can install pipenv by:
pip3 install pipenv
  • Use pipenv to install package. The first command is to install the package for development. The second command is to install the package for production.
pipenv install --dev
pipenv install 
  • Use pipenv to uninstall package:
pipenv uninstall

Pipenv Shells

  • Entering into a Pipenv-managed shell. Remeber doing this every time before running the project.
cd py-package-tempate
pipenv install
pipenv shell

License

ColDA is licensed under the Apache 2.0 License.

Code of Conduct

Please review and adhere to the Code of Conduct when contributing to ColDA.

Reference

Please use the following reference

@article{diao2022gal,
  title={GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations},
  author={Diao, Enmao and Ding, Jie and Tarokh, Vahid},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={11854--11868},
  year={2022}
}