google/recurrentgemma-9b · CUDA out of memory

Jun 18, 2024

No matter what I do, this model doesn't want to start.

import os
from transformers import AutoTokenizer, AutoModelForCausalLM

os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:1024'

# Замените 'ВАШ_ТОКЕН' на фактический токен аутентификации, который вы получили.
tokenizer = AutoTokenizer.from_pretrained("google/recurrentgemma-9b", token='***')
model = AutoModelForCausalLM.from_pretrained("google/recurrentgemma-9b", token='***')

model = model.to("cuda")

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

Windows 11, 32000mb RAM, 4070 ti 12000mb videomemory.

PS G:\AI\GoogleAI\RecurrentGemmaF9bit> python s.py
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 4/4 [01:12<00:00, 18.12s/it]
Traceback (most recent call last):
  File "G:\AI\GoogleAI\RecurrentGemmaF9bit\s.py", line 10, in <module>
    model = model.to("cuda")
            ^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 2724, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1145, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 11.99 GiB total capacity; 26.02 GiB already allocated; 0 bytes free; 26.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

lemon-mint

Jun 20, 2024

•

edited Jun 20, 2024

This error occurs because your vram size is smaller than the model size.

Try 4-bit or 8-bit quantization.

lkv

Google org 3 days ago

Hi,

Apologies for the delay.

I have successfully reproduced the issue. To resolve it, please enable gradient checkpointing and reduce the batch size and sequence length. For more details, kindly refer to this gist file. if you have concerns let me know will assist you.

Thank you.

google
/

recurrentgemma-9b

CUDA out of memory | Need help