How to fine tune this model?
Hi All! Can anyone share or advise how to fine tune this model? like using sentence-transformer or other tools? tutorial or code examples would be great :)
Thanks!
Hi @ququwowo , unfortunately we don't have any tutorials or code examples right now. SentenceTransformerTrainer might work with some changes, but we haven't tested it, so I can't give any tips yet. We will most likely publish a simple fine-tuning tutorial in the next few weeks and I'll let you know when it's ready.
Hi @ququwowo , unfortunately we don't have any tutorials or code examples right now. SentenceTransformerTrainer might work with some changes, but we haven't tested it, so I can't give any tips yet. We will most likely publish a simple fine-tuning tutorial in the next few weeks and I'll let you know when it's ready.
That's good news!
Hi @ququwowo , unfortunately we don't have any tutorials or code examples right now. SentenceTransformerTrainer might work with some changes, but we haven't tested it, so I can't give any tips yet. We will most likely publish a simple fine-tuning tutorial in the next few weeks and I'll let you know when it's ready.
Hi @jupyterjazz
hope all is well! May I follow up on this item?
Thanks!
I can report that simply modifying the sentence-transformers code that successfully can fine-tune version 3 will NOT work by simply moving to v4, with this error:
File "/opt/conda/lib/python3.11/site-packages/sentence_transformers/trainer.py", line 186, in __init__
if tokenizer is None and isinstance(model.tokenizer, PreTrainedTokenizerBase):
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
raise AttributeError(
AttributeError: 'SentenceTransformer' object has no attribute 'tokenizer'
Hello, I would also be interested in getting more informations on how to fine tune this multimodal embedding model for domain specific use cases
I was able to get fine-tuning to work with the SentenceTransformerTrainer by passing in tokenizer=model.tokenize
to the class instantiation.
I'll note that this model is VERY memory-intensive. On a 48 GB GPU I was able to train with the retrieval
task type with a batch size of 1 triplet. Any larger batch size would result in an OOM error.
Unless you need multi-modal embeddings, I'd personally suggest sticking to jina-embeddings-v3 which is better supported by ST and has a FAR smaller memory footprint. Here is some boilerplate code you can use to get started with that: https://huggingface.co/jinaai/jina-embeddings-v3/discussions/128#683a0102dca13af58b586ebd (pay attention to my later comments on that post- you'll need to set the default task every time you load, or when you call encode
.)