github2pandas-manager

Aggregation of github activities on multiple repositories based on github2pandas


Keywords
git, github, mining, learning, analytics, git-miner, git-mining-tool, learning-analytics, python
License
BSD-3-Clause-Attribution
Install
pip install github2pandas-manager==0.0.9

Documentation

github2pandas_manager Introduction

github2pandas_manager coordinates data aggregation activities for multiple GitHub-repositories. The user selects a list of repositories by names, name pattern, organizations or individual queries and provides a collection of versions, releases, pull-requests etc. For this purpose github2pandas_manager reads a configuration file (yml), collects the referenced repositories and provides the demanded information as Python pandas or csv files.

Take a view to the documentation of github2pandas for being familiar with the individual aggregation classes.

Application example

Automated.supervision.of.student.programming.activities.in.GitHub.Classrooms.mp4

Concept

Workflow

Installation

github2pandas-manager is available on pypi. Use pip to install the package.

global

On Linux:

sudo pip3 install github2pandas-manager 
sudo pip install github2pandas-manager

On Windows as admin or for one user:

pip install github2pandas-manager
pip install --user github2pandas-manager

in virtual environment:

pipenv install github2pandas-manager

In addition a GitHub token is required for authentication. The website describes how you can generate this for your GitHub account. Add your toke to an hidden .env file, an example is given in .env.example.

Run examples

The example folder contains four types of query configurations for different purposes:

Fokus Keywords Example
Repo names List all relevant repositories by username and repository name - repo_names ProjectsByRepoNames.yml
Repo name patterns Describe relevant repositories by white- and black-patterns - repo_white_pattern, repo_black_pattern ProjectsByRepoNamePatterns.yml
Repos by organizations Select all repositories of an organization account - organization_names ProjectsByOrganizations.yml
Repos by a set of query parameter Select all repositories according to programming languages, stars etc. - language, start_date, end_date, star_filter ProjectsByQuery.yml

In order to start the examples just run:

pipenv run python -m github2pandas_manager -path ./examples/ProjectsByQuery.yml

YAML-Configuration schema

In addition to the specific configuration parameters mentioned above, each request includes three further definitions - project_name, project_folder and content.

While the first two are used to structure the folders to hold the data, the last parameter describes the repository data to be aggregated:

  • Repository
  • Issues
  • Version
  • PullRequests
  • Workflows
  • GitReleases

An overview of the information contained in each data frame can be found in the wiki of the gitlab2pandas project.