Original code from FasterTransformer / TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM/tree/main/cpp/tensorrt_llm/kernels
Adapted to support a different quantization scheme.
Quantized MatMul in CUDA with a PyTorch interface
pip install quant-matmul==1.1.0.post1
Original code from FasterTransformer / TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM/tree/main/cpp/tensorrt_llm/kernels
Adapted to support a different quantization scheme.