How to reduce your reliance on "bad" open source packages ✨ RSVP

- GitHub
- GitLab
- Bitbucket
- By logging in you accept
  our terms of service
  and privacy policy

dame
Release 0.0.2

Release 0.0.2

0.0.2

0.0.1

Manage your dataflows seamlessly

Homepage PyPI Python

License: MIT
Install: pip install dame==0.0.2

Documentation

Dataflow Awesome Managing Engine

The easiest dataflow managing framework - currently under construction.

DAME solves/facilitates:

Building datasets from files / folders
Transforming data in the right order
Saving transformed data - once computed never compute it again
Choosing the best transformation from a few configurations

Great for working with numpy, pyTorch and more.

Vision

Technically:

Compute stages:
1. Sources - get data element
2. Transforms - compute something out of available data
3. Reducers - compute something on the whole dataset
Combining data sources
Compute only what you need - optimized performance via DAGs
Backup and cache, after stages, support for custom serializers
Ranking various configurations
(Optional) Parallel processing

Priorities:

Easy to use
Batteries included
Little overhead - take advantage of fastest tools available
Integrates seamlessly with other tools
Expandable

Nice to have:

Few python dependencies
Integrate tqdm
DAG output

Backlog:

1.0.0:

- Dataset - compute items via Sources and Transforms
- Dataset - compute stage by stage, (assequence)
- Dataset - validate Transforms
- Dataset - (_Stages) DAG computations
- Dataset - Automatic (Transform) versioning based on source and attrs
- Workers - MultiThreading / MultiProcessing
- Dataset - Building context for transforms
- Storage - SQLite
- WIP - Dataset - Enable Storage & Caching
- Reducer - Scoring
- Reducer - Ranking configurations, Find optimal parameters
- Stages - Make an actual DAG instead of topsort
- Cache - Ring
- Dataset - Compute by chunks for efficient cache
- Transform - Mapping Transform, Sequential transform
- Transform - Delete intermediate result
- Dataset - Autodelete unrequired objects form memory (Autosequential)
- Docs - Dame tutorial & more tests
- TODOS - Solve left todos from the code

Storage/Cache options:

Pickle
Joblib
Redis
Sqlite
PyTables
Parquet/Dask

2.0.0 Ideas:

Easy reuse Dame transforms in Luigi/Dask/Apache Hadoop
More built-in storage and cache options
Built-in datasets like torchvision.MNIST etc
Module for managing on disk datasets. GUI? Conversion between:
- Pytorch ImageFolder
- Images + csv
- Some Other

Development:

- tox - build
- tox - publish
- hosting docs on readthedocs
- tox - publish docs
- coverage
- badges

Dependencies: 2
Dependent packages: 0
Dependent repositories: 0
Total releases: 2
Latest release: Mar 11, 2020
First release: Nov 2, 2019
Stars: 0
Forks: 0
Watchers: 1
Contributors: 1
Repository size: 35.2 KB
SourceRank: 7

Source repo 2FA enabled: TEXT!
Package manager 2FA enabled: TEXT!
Is security responsive: TEXT!
Dependencies are managed: TEXT!
Issue-free release available: TEXT!
Succession plan available: TEXT!
Package manager 2FA enabled: TEXT!

Releases

0.0.2: Mar 11, 2020
0.0.1: Nov 2, 2019

Contributors

See all contributors

Something wrong with this page? Make a suggestion

Export .ABOUT file for this package

Last synced: 2021-02-13 19:33:48 UTC

Login to resync this project

Libraries.io helps you find new open source packages, modules and frameworks and keep track of ones you depend upon.

Copyright © 2024 Tidelift, Inc
Code is Open Source under AGPLv3 license
Data is available under CC-BY-SA 4.0 license

Explore

Platforms
Languages
Licenses

About
Team
Terms and Conditions
Privacy Policy
API