Model Inputs

VajraStreamer.load() accepts either a Hugging Face repo id or a huggingface.co URL.

# Repo id. This is the clearest form.
tensors = streamer.load("meta-llama/Meta-Llama-3-8B")

# Full Hugging Face URL.
tensors = streamer.load(
    "https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/model-00001-of-00004.safetensors"
)

Repo ID Is the Real Input

What you might expect: passing a URL to a specific shard streams just that shard.

The tricky part: the native resolver extracts only the repo id (owner/model) from a Hugging Face URL. Then it queries the Hugging Face model API, finds every .safetensors file in that repo, and loads all of them.

So these two inputs target the same model repo:

streamer.load("meta-llama/Meta-Llama-3-8B")

streamer.load(
    "https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/model-00001-of-00004.safetensors"
)

The path after /resolve/main/ is not used to restrict loading to one file.

Supported Hosts

Only huggingface.co URLs are supported. Other HTTP hosts are rejected during model resolution and surface as ConnectionError in Python.

Gated Models

For gated or private repos, pass a Hugging Face token through StreamConfig.auth_token:

config = StreamConfig(auth_token="hf_...")

with VajraStreamer(config) as streamer:
    tensors = streamer.load("meta-llama/Meta-Llama-3-8B")

If the token is missing or invalid, resolution can fail with ConnectionError.

Repo ID Is the Real Input​

Supported Hosts​

Gated Models​

Repo ID Is the Real Input

Supported Hosts

Gated Models