High-Performance
Model Streamer
Load models faster by overlapping download, RAM staging,
and CUDA transfer.
Model Load Time
Measured from request start to model weights staged in memory on the same model, machine, and network.
GPU loading began at 3.20s
Installation &
Usage
Install the Python SDK
Install the Python package and start streaming models from Python.
Load Hugging Face .safetensors models directly into PyTorch tensors with one Python call.
Benchmarks
Benchmarked On Meta Llama 3 8B
Transfer Metric
Total Weight Transfer Time
Vajra moved 14.96GB of Llama 3 8B .safetensors weights through the streaming pipeline 59% faster in this run.
Comparison used hf_transfer — HuggingFace's Rust-backed downloader (HF_HUB_ENABLE_HF_TRANSFER=1).
Streaming Timeline
GPU transfer starts before the download finishes
In this benchmark, Vajra started GPU transfer at 3.20s, while the 14.96GB download/RAM staging path completed in 20.14s. The hf_transfer download completed in 32.04s.