thekop79
/

dexml_eurlex-4k_hnm

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

thekop79 commited on Aug 18, 2024

Commit

b7e71da

·

verified ·

1 Parent(s): a02af01

Create README.md

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+---
+language: en
+license: apache-2.0
+library_name: sentence-transformers
+tags:
+- sentence-transformers
+- feature-extraction
+- sentence-similarity
+- transformers
+pipeline_tag: sentence-similarity
+---
+Distilbert encoder models trained on European law document tagging dataset (EURLex-4K) using [DEXML with cross-batch mix negative sampling ](https://github.com/thekop69/two-tower-dissertation) originally adapted from ([Dual Encoder for eXtreme Multi-Label classification, ICLR'24](https://arxiv.org/pdf/2310.10636v2.pdf)) method.
+## Inference Usage (Sentence-Transformers)
+With `sentence-transformers` installed you can use this model as following:
+```python
+from sentence_transformers import SentenceTransformer
+sentences = ["This is an example sentence", "Each sentence is converted"]
+model = SentenceTransformer('quicktensor/dexml_eurlex-4k')
+embeddings = model.encode(sentences)
+print(embeddings)
+```
+## Usage (HuggingFace Transformers)
+With huggingface transformers you only need to be a bit careful with how you pool the transformer output to get the embedding, you can use this model as following;
+```python
+from transformers import AutoTokenizer, AutoModel
+import torch
+import torch.nn.functional as F
+pooler = lambda x: F.normalize(x[:, 0, :], dim=-1) # Choose CLS token and normalize
+sentences = ["This is an example sentence", "Each sentence is converted"]
+tokenizer = AutoTokenizer.from_pretrained('quicktensor/dexml_eurlex-4k')
+model = AutoModel.from_pretrained('quicktensor/dexml_eurlex-4k')
+encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+with torch.no_grad():
+    embeddings = pooler(model(**encoded_input))
+print(embeddings)
+```
+## Cite the original authors
+If you found this model helpful, please cite our work as:
+```bib
+@InProceedings{DEXML,
+  author    = "Gupta, N. and Khatri, D. and Rawat, A-S. and Bhojanapalli, S. and Jain, P. and Dhillon, I.",
+  title     = "Dual-encoders for Extreme Multi-label Classification",
+  booktitle = "International Conference on Learning Representations",
+  month     = "May",
+  year      = "2024"
+}
+```