Skip to main content

Configuration

StreamConfig controls authentication, chunking, worker counts, cache behavior, and native logging.

from vajra import StreamConfig

config = StreamConfig(
auth_token="",
chunk_size_mb=64,
chunk_workers=16,
gpu_workers=3,
disable_cache=False,
log_level=4,
)

Fields

FieldTypeDefaultDescription
auth_tokenstr""Hugging Face API token. Required for gated or private repos.
chunk_size_mbint64Size of each download chunk in megabytes. Converted to bytes before native code receives it.
chunk_workersint16Number of parallel download workers.
gpu_workersint3Number of workers dedicated to GPU copy work.
disable_cacheboolFalseWhen True, disables the native on-disk cache.
log_levelint4vibe.d log verbosity.

Chunk Size

chunk_size_mb is converted before crossing into native code:

chunk_size = chunk_size_mb * 1024 * 1024

Larger chunks can reduce per-request overhead, but they also increase the amount of data handled per chunk. Smaller chunks may be easier to schedule but can increase coordination overhead.

Worker Counts

The native thread pool size is derived from:

chunk_workers + gpu_workers + 2

chunk_workers controls download parallelism. gpu_workers controls GPU copy work. Raising both can improve throughput, but too much parallelism can create network pressure, scheduling overhead, or higher peak memory pressure.

Cache Behavior

disable_cache=False keeps native caching enabled.

disable_cache=True skips the native cache. This is useful when benchmarking fresh loads or when you do not want downloaded chunks persisted.

Log Levels

ValueLevel
4info
3diagnostic
2debug
1debugV
0trace

Lower values are more verbose.

If you need to understand why a full Hugging Face URL does not restrict loading to one shard, read Model Inputs.