github-collect

A tool for getting github repositories metadata


Keywords
git github collect dump metadata
License
GPL-3.0+
Install
pip install github-collect==0.9.0

Documentation

GitHub-Collect

This package provides functionality to get metadata of github repositories.

This package works in two modes:

  • Getting metadata of all github repositories
  • Getting metadata of github repositories represented by a search filter

This package has an appropriate backoff when api rate limit is reached or network problem happens or github answers with "incomplete results".

In the search filter mode it uses a "divide-and-conquer" approach on creation_date to overcome 1000 repos limit of github search API to get all repositories with that properties.

Only python 3.5+ is supported.

To install use:

pip install github_collect

Example to get all github repositories:

import github_collect
import asyncio
event_loop = asyncio.get_event_loop()
event_loop.run_until_complete(asyncio.ensure_future(github_collect.get_all_repos('/path/to/store', api_token='your_api_token_for_github')))

Example to get all github C and C++ repositories:

import github_collect
import asyncio
event_loop = asyncio.get_event_loop()
event_loop.run_until_complete(asyncio.ensure_future(github_collect.get_all_repos('/path/to/store', ['language:cpp', 'language:c'], api_token='your_api_token_for_github')))