OutOfMemoryError for large corpus
#10
by
escher-cqse
- opened
I am using the Salesforce/SFR-Embedding-Code-2B_R model to generate code embeddings.
Both my query and corpus are source code methods:
def get_codexembedder_embeddings(self):
model = AutoModel.from_pretrained('Salesforce/SFR-Embedding-Code-2B_R', trust_remote_code=True)
model.to(self.device)
query_instruction_example = "Given Code or Text, retrieve relevant content"
change_code_batch = list(self.change_data.values())
test_code_batch = list(self.test_data.values())
max_length = 32768
with torch.no_grad():
query_embeddings = model.encode_queries(change_code_batch, instruction=query_instruction_example, max_length=max_length)
passage_embeddings = model.encode_corpus(test_code_batch, max_length=max_length)
This runs into the issue that in model.encode_corpus(test_code_batch, max_length=max_length)
there is an out of memory error, since the model and the whole corpus seem to be loaded into the GPU at the same time.
The GPU in question has 24GB of VRAM, so I would expect it to be no issue.
What is the recommended approach for handling larger corpus?
Is it still correct if I process every source code piece separately with encode_corpus
?
Here is the stack trace:
Traceback (most recent call last):
File "/opt/teamscale/ml/code_embedder.py", line 202, in <module>
main()
File "/opt/teamscale/ml/code_embedder.py", line 198, in main
embedder.save_to_file(embedder.get_codexembedder_embeddings(), "codexembed")
File "/opt/teamscale/ml/code_embedder.py", line 146, in get_codexembedder_embeddings
passage_embeddings = model.encode_corpus(test_code_batch, max_length=max_length)
File "/home/teamscale/.cache/huggingface/modules/transformers_modules/Salesforce/SFR-Embedding-Code-2B_R/c73d8631a005876ed5abde34db514b1fb6566973/modeling_gemma2.py", line 1392, in encode_corpus
return self.encode_text(corpus, max_length)
File "/home/teamscale/.cache/huggingface/modules/transformers_modules/Salesforce/SFR-Embedding-Code-2B_R/c73d8631a005876ed5abde34db514b1fb6566973/modeling_gemma2.py", line 1376, in encode_text
model_output = self.model(**encoded_input)
File "/home/teamscale/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/teamscale/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/teamscale/.cache/huggingface/modules/transformers_modules/Salesforce/SFR-Embedding-Code-2B_R/c73d8631a005876ed5abde34db514b1fb6566973/modeling_gemma2.py", line 815, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/teamscale/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/teamscale/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/teamscale/.local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 190, in forward
return F.embedding(
File "/home/teamscale/.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2551, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.64 GiB. GPU 0 has a total capacity of 21.95 GiB of which 11.67 GiB is free. Process 16185 has 10.12 GiB memory in use. Of the allocated memory 9.85 GiB is allocated by PyTorch, and 55.83 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)