Vision LLama

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta. PAPER LINK

install

$ pip install vision-llama

usage

import torch
from vision_llama.main import VisionLlama

# Forward Tensor
x = torch.randn(1, 3, 224, 224)

# Create an instance of the VisionLlamaBlock model with the specified parameters
model = VisionLlama(
    dim=768, depth=12, channels=3, heads=12, num_classes=1000
)


# Print the shape of the output tensor when x is passed through the model
print(model(x))

License

MIT

Citation

@misc{chu2024visionllama,
    title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks}, 
    author={Xiangxiang Chu and Jianlin Su and Bo Zhang and Chunhua Shen},
    year={2024},
    eprint={2403.00522},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

todo

Implement the AS2DRoPE rope, might just use axial rotary embeddings instead, my implementation is really bad
Implement the GSA attention, i implemented it but's bad
Add imagenet training script with distributed

vision-llama
Release 0.0.8

Release 0.0.8

0.0.8

0.0.7

0.0.6

0.0.5

0.0.2

0.0.1

Documentation

Vision LLama

install

usage

License

Citation

todo

Stats

Development practices

Releases

Contributors

vision-llama Release 0.0.8

Release 0.0.8 Toggle Dropdown 0.0.8 0.0.7 0.0.6 0.0.5 0.0.2 0.0.1

Documentation

Vision LLama

install

usage

License

Citation

todo

Stats

Development practices

Releases

Contributors

vision-llama
Release 0.0.8

Release 0.0.8

0.0.8

0.0.7

0.0.6

0.0.5

0.0.2

0.0.1