Skip to main content

VRAM Tracking

The native library can log CUDA memory allocation activity to a CSV file.

It intercepts:

  • cudaMalloc
  • cudaFree
  • cuMemAlloc
  • cuMemFree

Tracking is enabled by default when the native library is loaded.

Log File Location

Set VAJRA_VRAM_LOG to choose the output path:

VAJRA_VRAM_LOG=/tmp/my_run.csv python your_script.py

If VAJRA_VRAM_LOG is not set, the default path is:

/tmp/vajra_vram_allocs.csv

The CSV columns are:

ColumnMeaning
TIMESTAMPUnix timestamp for the allocation event.
APICUDA API that was intercepted.
PTRDevice pointer.
SIZEAllocation size in bytes. Frees use 0.
BACKTRACENative call stack frames separated by semicolons.

Pause and Resume Tracking

Use the static helpers on VajraStreamer:

from vajra import VajraStreamer

VajraStreamer.pause_vram_tracking()
# Allocations here are not logged.
VajraStreamer.resume_vram_tracking()

This is useful when you want to exclude known allocations from the log.

What Could Go Wrong

The tracker hooks CUDA allocation functions at the native library level. If another library also hooks or wraps CUDA allocation APIs, logs can be incomplete or harder to interpret.

The log records allocation activity, not high-level ownership. Use the BACKTRACE column to infer whether an allocation came from model weights, temporary buffers, or other CUDA work.