Rust Crate Pipeline v2.2.0

A comprehensive system for gathering, enriching, and analyzing metadata for Rust crates using AI-powered insights, web scraping, and dependency analysis. This pipeline provides deep analysis of Rust crates with support for multiple LLM providers, advanced web scraping, and the Sigil Protocol for Sacred Chain analysis.

🚀 Quick Start

Option 1: Install via pip (Recommended for users)

# Install the package
pip install rust-crate-pipeline

# Run with Azure OpenAI (most common)
python run_with_llm.py --provider azure --model gpt-4o

# Or use the module directly
python -m rust_crate_pipeline --llm-provider azure --llm-model gpt-4o

Option 2: Clone and run from repository (Recommended for developers)

# Clone the repository
git clone https://github.com/Superuser666-Sigil/SigilDERG-Data_Production.git
cd SigilDERG-Data_Production

# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Install Playwright browsers (required for web scraping)
playwright install

# Run the pipeline
python run_with_llm.py --provider azure --model gpt-4o

✨ Key Features

🤖 Multi-Provider LLM Support: Azure OpenAI, OpenAI, Anthropic, Ollama, LM Studio, Lambda.AI, and all LiteLLM providers
🌐 Advanced Web Scraping: Crawl4AI + Playwright for intelligent content extraction
⚡ Auto-Resume Capability: Automatically skips already processed crates
📊 Real-time Progress Tracking: Comprehensive monitoring and error recovery
🔒 Sigil Protocol Support: Sacred Chain analysis with IRL trust scoring
🐳 Docker Support: Containerized deployment with docker-compose
📦 Batch Processing: Configurable memory optimization and cost control
🛡️ Security Analysis: Privacy and security scanning with Presidio
📈 Comprehensive Output: JSON metadata with detailed crate analysis

📋 Requirements

Python 3.12+ (required)
Git (for repository operations)
Cargo (for Rust crate analysis)
Playwright browsers (auto-installed via playwright install)

🔧 Installation & Setup

For End Users (pip install)

# Install the package (includes all dependencies)
pip install rust-crate-pipeline

# Install Playwright browsers
playwright install

# Set up environment variables (optional but recommended)
export AZURE_OPENAI_ENDPOINT="your_endpoint"
export AZURE_OPENAI_API_KEY="your_api_key"
export GITHUB_TOKEN="your_github_token"

For Developers (repository clone)

# Clone the repository
git clone https://github.com/Superuser666-Sigil/SigilDERG-Data_Production.git
cd SigilDERG-Data_Production

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Install Playwright browsers
playwright install

# Set up environment variables
export AZURE_OPENAI_ENDPOINT="your_endpoint"
export AZURE_OPENAI_API_KEY="your_api_key"
export GITHUB_TOKEN="your_github_token"

🎯 Usage Examples

Basic Usage

# Resume processing with Azure OpenAI (recommended)
python run_with_llm.py --provider azure --model gpt-4o

# Process specific crates with OpenAI
python run_with_llm.py --provider openai --model gpt-4 --api-key YOUR_KEY --crates tokio serde

# Use local Ollama model
python run_with_llm.py --provider ollama --model llama2

# Process from file with custom batch size
python run_with_llm.py --provider azure --model gpt-4o --crates-file data/crate_list.txt --batch-size 5

Advanced Usage

# Custom configuration with all options
python run_with_llm.py \
  --provider azure \
  --model gpt-4o \
  --batch-size 10 \
  --max-tokens 2048 \
  --checkpoint-interval 5 \
  --log-level DEBUG \
  --output-path ./results \
  --skip-problematic

# Use the module directly (alternative entry point)
python -m rust_crate_pipeline \
  --llm-provider azure \
  --llm-model gpt-4o \
  --limit 50 \
  --batch-size 5 \
  --output-dir ./data \
  --log-level DEBUG

# Enable Sigil Protocol for Sacred Chain analysis (module entry point only)
python -m rust_crate_pipeline --enable-sigil-protocol --crates tokio serde

Docker Usage

# Build and run with Docker
docker-compose up --build

# Or build manually
docker build -t rust-crate-pipeline .
docker run -e AZURE_OPENAI_API_KEY=your_key rust-crate-pipeline

🔑 Environment Variables

Configure your LLM providers and API keys:

# Azure OpenAI (recommended)
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your_api_key"
export AZURE_OPENAI_DEPLOYMENT_NAME="your_deployment"

# OpenAI
export OPENAI_API_KEY="your_api_key"

# Anthropic
export ANTHROPIC_API_KEY="your_api_key"

# GitHub (for enhanced metadata)
export GITHUB_TOKEN="your_github_token"

# Lambda.AI
export LAMBDA_API_KEY="your_api_key"

🏗️ Architecture Overview

Entry Points

The pipeline provides two main entry points:

run_with_llm.py - Comprehensive script with full LLM provider support
- Supports all LLM providers (Azure, OpenAI, Anthropic, Ollama, etc.)
- Advanced configuration options and batch processing
- Auto-resume capability and progress tracking
- Recommended for most users
python -m rust_crate_pipeline - Module entry point with Sigil Protocol support
- Includes Sigil Protocol for Sacred Chain analysis
- IRL trust scoring and cryptographic audit trails
- Simplified configuration for focused analysis
- Use with --enable-sigil-protocol flag

Core Components

UnifiedLLMProcessor - Handles all LLM providers uniformly
UnifiedSigilPipeline - Core pipeline orchestration with Sigil Protocol support
Crawl4AI Integration - Advanced web scraping with AI extraction
ProgressMonitor - Real-time progress tracking and auto-resume

Key Features

Auto-resume: Automatically skips already processed crates
Progress tracking: Real-time monitoring with detailed logging
Error recovery: Robust error handling and retries
Memory optimization: Configurable batch sizes for different environments
Cost control: Budget management and tracking
Sigil Protocol: Sacred Chain analysis with IRL trust scoring

📚 Documentation

LLM Provider Guide - Complete LLM provider setup and usage
Configuration Guide - Detailed configuration options
Performance Optimization - Tuning for production use
Lambda.AI Setup - Lambda.AI specific configuration
Crawl4AI Analysis - Web scraping implementation details

🛠️ Development

Building and Testing

# Build package
python -m build

# Run tests
pytest --cov=rust_crate_pipeline tests/

# Type checking
pyright rust_crate_pipeline/
mypy rust_crate_pipeline/

# Code formatting
black rust_crate_pipeline/
isort rust_crate_pipeline/

# Linting
flake8 rust_crate_pipeline/

# Security checks
bandit -r rust_crate_pipeline/
safety check

Publishing

# Build and upload to PyPI
python -m build
twine upload dist/*

📦 Project Structure

rust_crate_pipeline/
├── main.py                    # Module entry point
├── unified_llm_processor.py   # Multi-provider LLM support
├── unified_pipeline.py        # Main pipeline orchestration
├── ai_processing.py           # LLM enrichment logic
├── crawl4ai_integration.py    # Web scraping integration
├── progress_monitor.py        # Progress tracking
├── config.py                  # Configuration management
├── core/                      # Core analysis components
│   ├── irl_engine.py         # IRL trust scoring
│   ├── sacred_chain.py       # Sacred Chain analysis
│   └── canon_registry.py     # Canon registry
├── scraping/                  # Web scraping modules
├── utils/                     # Utility functions
└── audits/                    # Audit and validation

🔄 Version 2.2.0 Changes

Fixed Setuptools warnings - Clean build process with no deprecation warnings
Improved requirements management - Synchronized dependencies across all config files
Enhanced error handling - Better exception handling and logging
Updated dependencies - Latest versions with security fixes
Streamlined installation - Simplified setup for both pip and repository users
Sigil Protocol support - Sacred Chain analysis with IRL trust scoring (via python -m rust_crate_pipeline --enable-sigil-protocol)

🐳 Docker Support

The project includes full Docker support for containerized deployment:

# Quick start with docker-compose
docker-compose up --build

# Manual Docker build
docker build -t rust-crate-pipeline .
docker run -e AZURE_OPENAI_API_KEY=your_key rust-crate-pipeline

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests and linting
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🔧 Troubleshooting

Common Issues

Playwright Installation

# If you get Playwright errors, install browsers
playwright install

LLM Provider Issues

# Check your API keys and endpoints
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your_api_key"

# Test connection
python -c "from rust_crate_pipeline.unified_llm_processor import UnifiedLLMProcessor; print('✅ LLM processor ready')"

Memory Issues

# Reduce batch size for low-memory environments
python run_with_llm.py --provider azure --model gpt-4o --batch-size 2

Permission Issues

# On Linux/macOS, ensure proper permissions
chmod +x run_with_llm.py
chmod +x rust_crate_pipeline/main.py

Getting Help

Issues: GitHub Issues
Documentation: Project Wiki
Discussions: GitHub Discussions

rust-crate-pipeline
Release 4.0.0

Release 4.0.0

4.0.0

3.0.0

2.3.0

2.2.3

2.2.2

2.2.1

2.2.0

2.1.1

2.1.0

1.5.6

Documentation

Rust Crate Pipeline v2.2.0

🚀 Quick Start

Option 1: Install via pip (Recommended for users)

Option 2: Clone and run from repository (Recommended for developers)

✨ Key Features

📋 Requirements

🔧 Installation & Setup

For End Users (pip install)

For Developers (repository clone)

🎯 Usage Examples

Basic Usage

Advanced Usage

Docker Usage

🔑 Environment Variables

🏗️ Architecture Overview

Entry Points

Core Components

Key Features

📚 Documentation

🛠️ Development

Building and Testing

Publishing

📦 Project Structure

🔄 Version 2.2.0 Changes

🐳 Docker Support

🤝 Contributing

📄 License

🔧 Troubleshooting

Common Issues

Getting Help

Stats

Releases

Contributors

rust-crate-pipeline Release 4.0.0

Release 4.0.0 Toggle Dropdown 4.0.0 3.0.0 2.3.0 2.2.3 2.2.2 2.2.1 2.2.0 2.1.1 2.1.0 1.5.6

Documentation

Rust Crate Pipeline v2.2.0

🚀 Quick Start

Option 1: Install via pip (Recommended for users)

Option 2: Clone and run from repository (Recommended for developers)

✨ Key Features

📋 Requirements

🔧 Installation & Setup

For End Users (pip install)

For Developers (repository clone)

🎯 Usage Examples

Basic Usage

Advanced Usage

Docker Usage

🔑 Environment Variables

🏗️ Architecture Overview

Entry Points

Core Components

Key Features

📚 Documentation

🛠️ Development

Building and Testing

Publishing

📦 Project Structure

🔄 Version 2.2.0 Changes

🐳 Docker Support

🤝 Contributing

📄 License

🔧 Troubleshooting

Common Issues

Getting Help

Stats

Releases

Contributors

rust-crate-pipeline
Release 4.0.0

Release 4.0.0

4.0.0

3.0.0

2.3.0

2.2.3

2.2.2

2.2.1

2.2.0

2.1.1

2.1.0

1.5.6