bongovaad (বঙ্গবাদ)

bongovaad is a Python package for transcribing Bengali audio from YouTube videos. It uses a fine-tuned Whisper model optimized for Bengali speech recognition.

Features

Fine-tuned Model: We've LoRA-tuned the whisper-large-v2 model on the 'bn' subset of Mozilla Common Voice 13, achieving a Word-Error-Rate(WER) of 57, compared to the WER of 103.4 in the original OpenAI paper (Page 23).
SRT Subtitle Generation: Automatically creates SRT subtitle files for video players.
Efficient Audio Processing: Handles audio segmentation for longer videos with progress tracking.
Temporary File Management: Uses temporary directories for clean processing.
Robust Error Handling: Comprehensive error handling and logging.
Command-line Interface: Easy-to-use CLI with multiple options.

Requirements

Python 3.8 or higher
CUDA-compatible GPU (recommended for faster processing)
ffmpeg

Installation

1. Install ffmpeg

Ubuntu/Debian:

sudo apt install ffmpeg -y

macOS (using Homebrew):

brew install ffmpeg

Windows (using Chocolatey):

choco install ffmpeg

2. Install bongovaad

pip install bongovaad

Usage

Basic Usage

bongovaad --url "https://www.youtube.com/watch?v=VIDEO_ID"

Advanced Options

bongovaad --url "https://www.youtube.com/watch?v=VIDEO_ID" \
          --segment-length 10 \
          --output-format both \
          --device cuda \
          --verbose

Command-line Arguments

Argument	Description	Default
`--url`	YouTube URL for audio transcription	(Required)
`--segment-length`	Length of each audio segment in seconds	8
`--output-format`	Output format (txt, srt, or both)	both
`--use-8bit`	Use 8-bit quantization for the model	True
`--device`	Device to use for inference (auto, cuda, cpu)	auto
`--verbose`	Enable verbose logging	False

Output Files

The tool generates two types of output files:

Text File (VIDEO_ID.txt): Contains the full transcription text.
SRT File (VIDEO_ID.srt): Contains time-coded subtitles compatible with video players.

Python API

You can also use bongovaad as a Python library:

from bongovaad import BongoVaadTranscriber

# Initialize the transcriber
transcriber = BongoVaadTranscriber(use_8bit=True, device="cuda")

# Transcribe a YouTube video
output_files = transcriber.transcribe(
    url="https://www.youtube.com/watch?v=VIDEO_ID",
    segment_length_seconds=10,
    output_format="both"
)

# Print output file paths
print(f"Text file: {output_files['txt']}")
print(f"SRT file: {output_files['srt']}")

Performance Considerations

Processing time depends on video length and hardware capabilities.
Using a GPU significantly improves performance.
Longer segment lengths may improve speed but could reduce accuracy for complex audio.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Here's how you can contribute:

Fork the repository
Create a feature branch: git checkout -b feature-name
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin feature-name
Submit a pull request

Acknowledgements

PEFT for parameter-efficient fine-tuning
Whisper for the base ASR model
Mozilla Common Voice for the training dataset

bongovaad
Release 0.5.0

Release 0.5.0

0.5.0

0.3

Documentation

bongovaad (বঙ্গবাদ)

Features

Requirements

Installation

1. Install ffmpeg

2. Install bongovaad

Usage

Basic Usage

Advanced Options

Command-line Arguments

Output Files

Python API

Performance Considerations

License

Contributing

Acknowledgements

Stats

Releases

Contributors

bongovaad Release 0.5.0

Release 0.5.0 Toggle Dropdown 0.5.0 0.3

Documentation

bongovaad (বঙ্গবাদ)

Features

Requirements

Installation

1. Install ffmpeg

2. Install bongovaad

Usage

Basic Usage

Advanced Options

Command-line Arguments

Output Files

Python API

Performance Considerations

License

Contributing

Acknowledgements

Stats

Releases

Contributors

bongovaad
Release 0.5.0

Release 0.5.0

0.5.0

0.3