ccontext (collect-context) is a cross-platform utility designed to streamline the process of gathering and sending the context of a directory to large language models (LLMs) like ChatGPT-4o. Our mission is to make collecting and sending context to an LLM as easy as possible.
- π Easy Setup: Quick installation and configuration.
- π§ Configurable Exclusions and Inclusions: Flexibly specify which files and directories to include or exclude.
- βοΈ Tokenization and Chunking: Automatically handles tokenization and chunking to stay within LLM token limits.
- π Cross-Platform Support: Supports Windows, macOS, and Linux.
- π£οΈ Verbose Output: Optional verbose mode for detailed output and debugging.
- π Markdown and PDF Generation: Generate detailed Markdown and PDF files of the directory structure and file contents.
- π Crawling of (documentation) Sites: Crawl and gather data from multiple sites using a specified list of URLs.
- π Prompt Templates (Upcoming): Create and use custom templates for different types of prompts.
kiko@lappie:~/.USER_SCRIPTS/ccontext$ ccontext
Using user config file: /home/kiko/.ccontext/config.json
Root Path: /home/kiko/.USER_SCRIPTS/ccontext
π ccontext
π .github
π workflows
π 751 publish-to-pypi.yml
π 1664 .gitignore
π .vscode
π 13 settings.json
π 9 MANIFEST.in
π 1392 README.md
π ccontext
π 0 Helvetica.ttf
π 0 NotoEmoji-VariableFont_wght.ttf
π 0 __init__.py
π 18 __main__.py
π 452 argument_parser.py
π 79 cli.py
π 699 clipboard.py
π 216 config.json
π 231 configurator.py
π 168 content_handler.py
π 209 file_node.py
π 779 file_system.py
π 655 file_tree.py
π 1089 main.py
π 800 md_generator.py
π 774 output_handler.py
π 1536 pdf_generator.py
π 975 tokenizer.py
π 414 utils.py
π 135 ideas.MD
π 56 requirements.txt
π 87 run_ccontext.sh
π 291 setup.py
Tokens: 14,718/32,000
Output copied to clipboard!
ccontext is available on PyPI and can be installed using pip:
pipx install ccontext
-
Clone the repository:
git clone https://github.com/oxillix/ccontext.git cd ccontext
-
Set up a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Install the package:
pip install .
-
Run
ccontext
in the current folder with default settings defined in~/.ccontext/config.json
:ccontext
-
Specify a root path, exclusions, and inclusions:
ccontext -p /path/to/directory -e ".git|node_modules" -i "important_file.txt|docs"
-
-h, --help
: Show help message. -
-p, --root_path
: The root path to start the directory tree (default: current directory). -
-e, --excludes
: Additional files or directories to exclude, separated by|
, e.g.,node_modules|.git
. -
-i, --includes
: Files or directories to include, separated by|
, e.g.,important_file.txt|docs
. -
-m, --max_tokens
: Maximum number of tokens allowed before chunking. -
-c, --config
: Path to a custom configuration file. -
-v, --verbose
: Enable verbose output to stdout. -
-ig, --ignore_gitignore
: Ignore the.gitignore
file for exclusions. -
-g, --generate-pdf
: Generate a PDF of the directory tree and file contents. -
-gm, --generate-md
: Generate a Markdown file of the directory tree and file contents. -
--crawl
: Crawls the sites specified in the config
ccontext -p /home/user/project -e ".git|build" -i "README.md|src"
You can customize the behavior of ccontext
by creating a configuration file. The default configuration file is config.json
located in the user's home directory under .ccontext
. You can also provide a custom configuration file via the -c
argument.
{
"verbose": false, // prints more data on the screen
"max_tokens": 32000, // max token size of input prompt / maximum size of the chunks
"model_type": "gpt-4o", // sets tiktoken.encoding_for_model()
"buffer_size": 0.05, // a buffer for max_tokens that limits how full the chunks can be
"excluded_folders_files": [
".git",
"bin",
"build",
"node_modules",
"venv",
"__pycache__",
"package-lock.json",
"ccontext.egg-info",
"dist",
"__tests__",
"coverage",
".next",
"pnpm-lock.yaml",
"poetry.lock",
"ccontext-output.pdf",
"ccontext-output.md",
".phpstorm.meta.php",
"*.min.js",
"composer.lock",
"*.lock",
"vendor",
"laravel_access.log"
],
"included_folders_files": [],
"context_prompt": "[[SYSTEM INSTRUCTIONS]] The following output represents a detailed directory structure and file contents from a specified root path. The file tree includes both excluded and included files and directories, clearly marking exclusions. Each file's content is displayed with comprehensive headings and separators to enhance readability and facilitate detailed parsing for extracting hierarchical and content-related insights. If the data represents a codebase, interpret and handle it as such, providing appropriate assistance as a programmer AI assistant. [[END SYSTEM INSTRUCTIONS]]",
"urls_to_crawl": [
{
"url": "https://www.django-rest-framework.org/",
"match": [
"https://www.django-rest-framework.org/**"
],
"exclude": [
"https://www.django-rest-framework.org/community/**"
],
"selector": "",
"maxPagesToCrawl": 100,
"outputFileName": "django-rest-framework.org.json",
"maxTokens": 2000000
}
]
}
- Codebase Context: Send the entire codebase as context to an LLM in one go, avoiding the need to copy and paste snippets manually.
- Document Generation: Generate detailed Markdown and PDF files of your directory structure and file contents, to easily RAG upon.]
- Documentation crawling: crawl any (documentation) site there is, and use it for sending context
We welcome contributions to ccontext
! Please follow these steps to contribute:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Commit your changes and push them to your branch.
- Submit a pull request with a description of your changes.
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by the need to streamline the process of providing context to LLMs.
- Thanks to the contributors and users who have provided valuable feedback and suggestions.
Here are some ideas that might be implemented in future versions of ccontext
:
- Document Support: Incorporate the ability to handle documents such as PDFs and image files in prompts.
- Binary File Handling: Introduce mechanisms to manage non-text file types effectively.
Feel free to raise issues or contribute to the project. We appreciate your support!
Nicolas Arnouts
arnouts.software@gmail.com