Specify library_name metadata

by tomaarsen HF Staff - opened Feb 1, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+30

-29

tomaarsen

Feb 1, 2024

•

edited Feb 1, 2024

Hello @avsolatorio !

Pull Request overview

Specify library_name metadata

Details

Hugging Face will be able to automatically add "Use in Sentence Transformers" etc. buttons to your model if the library_name is specified as Sentence Transformers.

Edit: It seems that the online README metadata editor doesn't like floating point values ending in .0, nor that end-of-file marker. Apologies that this made the PR a bit bigger than I had intended.

Tom Aarsen

Specify library_name metadata5e8fd848

avsolatorio

Owner Feb 1, 2024

Thanks! :D

avsolatorio changed pull request status to merged Feb 1, 2024

tomaarsen

Feb 1, 2024

Hello!

I see now that the pipeline can't be automatically inferred anymore:

We can resolve this by setting the pipeline_tag to either: sentence-similarity or feature-extraction. Most people set both in the tags as well so it's easier to search for the model.

pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity

bge-m3 is an example that uses Sentence Similarity:
intfloat/multilingual-e5-large is an example that uses Feature Extraction:

This also affects the default pipeline used in the free serverless Inference Endpoints, e.g. see this link: https://huggingface.co/intfloat/multilingual-e5-large?inference_api=true

So, in short, I would recommend choosing the one that you prefer and adding it in the metadata :)

Tom Aarsen

avsolatorio

Owner Feb 1, 2024

Hello Tom,

Great suggestion! I tried changing the default pipeline to sentence similarity, but it did not work. 😅

At first, it complained that no pytorch_model.bin existed in the model directory. I attempted to fix this by uploading a pytorch_model.bin file created using torch.save(model, 'pytorch_model.bin'), but this seems incorrect since I am getting a new error saying the 'SentenceTransformer' object has no attribute 'keys'.

Do you have any suggestions on how I might address this correctly?

Thank you!

Best,
Aivin

tomaarsen

Feb 1, 2024

•

edited Feb 1, 2024

Oh, that makes sense actually - Sentence Transformers only very recently gotmodel.safetensors support, so the pipeline code probably still uses an older version.
Saving a model with the old pytorch_model.bin is a bit tricky with Sentence Transformers actually:

model = SentenceTransformer("avsolatorio/GIST-Embedding-v0")
model[0].auto_model.save_pretrained("tmp", safe_serialization=False)

This gets the underlying transformers model, that way we keep it compatible with core transformers.

I'll make you a PR!

Tom Aarsen

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment