Spaces:
Running
on
Zero
Trouble duplicating space
Just wanted to check in to compare troubleshooting approaches - when I try to duplicate this space to an Nvidia-compliant GPU (e.g. other than Zero-CPU) I'm constantly hitting an error that indicates a shortage of reserve memory. The inline recommendation from the error message is this is to set this true in order to avoid fragmentation of GPU memory. Is this a change best made in the project level/code repo or is there any easier way to set it locally from HFace's settings panel?
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Problem noted here:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.62 GiB. GPU 0 has a total capacity of 22.05 GiB of which 21.86 GiB is free. Including non-PyTorch memory, this process has 184.00 MiB memory in use. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Bro i'm really novice to all this, rather myself taking ChatGPT's help :) just check with it.
I was about to check this combination but thanks you warned me. I think running it on H200 at some other pod may work..