Python SDK available

World's Fastest
Model Streamer

Load models faster by overlapping download, RAM staging, and CUDA transfer.

Get started View benchmarks →

Model Load Time

Lower is better

VAJRA8.22s

NVIDIA RUN:AI15.85s

Hugging Face HF_TRANSFER36.88s

Performance
Delta

350%(4.5× faster)

Measured from request start to model weights staged in memory on the same model, machine, and network.GPU loading began at 0.65s

Installation
& Usage

Install the Python SDK

Install the Python package and start streaming models from Python.

Load Hugging Face .safetensors models directly into PyTorch tensors with a single call.

Get started

$ pip install vajra-streamer

from vajra import VajraStreamer, StreamConfig

config = StreamConfig(

auth_token="hf_...",

chunk_size_mb=64,

chunk_workers=16,

gpu_workers=3,

disable_cache=True,

)

url = "meta-llama/Meta-Llama-3-8B"

with VajraStreamer(config) as streamer:

tensors = streamer.load(url)

print(f"Loaded {len(tensors)} tensors")

print(tensors["model.layers.0.self_attn.q_proj.weight"].shape)

Benchmarked with Meta Llama 3 8B

The numbers
speak for themselves.

Hugging Face Hub Source

Vajra vs Hugging Face Model Loader

Lower is better

VajraHF_Transfer

Time (seconds)

VAJRA8.22s

GPU transfer begins (0.65s)

Hugging Face HF_TRANSFER36.88s

Both load 14.96 GB of Llama 3 8B .safetensors from the Hugging Face Hub. Vajra begins GPU transfer at 0.65s, before the download finishes, while hf_transfer must complete the full download first.

348.9% (4.49× faster)

S3 Source

Vajra vs Run:AI Model Streamer

Lower is better

VajraRun:AI Model Streamer

Time (seconds)

VAJRA12.97s

GPU transfer begins (2.65s)

NVIDIA RUN:AI MODEL STREAMER15.85s

Both Vajra and Run:AI Model Streamer use S3 as the source. Vajra begins GPU transfer at 2.65s, while Run:AI must complete the full download first.

22.2% (1.22× faster)

Ready to optimize your model loading?

Get started now

Python SDK available on GitHub

World's FastestModel Streamer

Installation& Usage

Install the Python SDK

The numbersspeak for themselves.

Vajra vs Hugging Face Model Loader

Vajra vs Run:AI Model Streamer

Ready to optimize your model loading?

World's Fastest
Model Streamer

Installation
& Usage

The numbers
speak for themselves.