Trouble duplicating space

#7
by SETRusty - opened

Just wanted to check in to compare troubleshooting approaches - when I try to duplicate this space to an Nvidia-compliant GPU (e.g. other than Zero-CPU) I'm constantly hitting an error that indicates a shortage of reserve memory. The inline recommendation from the error message is this is to set this true in order to avoid fragmentation of GPU memory. Is this a change best made in the project level/code repo or is there any easier way to set it locally from HFace's settings panel?

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Problem noted here:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.62 GiB. GPU 0 has a total capacity of 22.05 GiB of which 21.86 GiB is free. Including non-PyTorch memory, this process has 184.00 MiB memory in use. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Bro i'm really novice to all this, rather myself taking ChatGPT's help :) just check with it.
I was about to check this combination but thanks you warned me. I think running it on H200 at some other pod may work..

ZeroGPU AoTI org

@SETRusty looks like a vram issue, i'm not sure on which hardware you're trying to use it but ZeroGPU uses H200 so that should work

Sign up or log in to comment