Samples-filter is a command-line filter
for GitHub repositories that contain samples
,
instead of real project or framework or library.
E.g. leeowenowen/rxjava-examples,
streaming-with-flink/examples-java,
redisson/redisson-examples.
Motivation. During the work on CaM project, where we're building datasets with open source Java programs, we discovered the need for filtering repositories that contain not a real code, but rather samples, tutorials or examples. This repository is portable command-line tool that filters those sample repositories.
First, install it from PyPI like that:
pip install samples-filter
then, execute:
samples-filter filter --repositories=repos.csv --out=filtered.csv
For --repositories
you should provide a name of existing CSV dataset
with GitHub repositories, and name for the output file in --out
(it will be created automatically). If you feel missed, try --help
and tool
will explain to you what you should do.
Optionally, you can decide which model to use for
filtering via --model
. You can pass either transformer
(the default one), or
rf
.
Fork repository, make changes, send us a pull request.
We will review your changes and apply them to the master
branch shortly,
provided they don't violate our quality standards. To avoid frustration,
before sending us your pull request please run full build:
make install cov check
To set up virtual environment use this set of commands:
python3 -m venv venv
source $(pwd)/venv/bin/activate
You will need Python 3.11+ installed.