thekop79
/

dexml_movielens-25M

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

dexml_movielens-25M / README.md

thekop79's picture

Initial commit of Movielens model

9c2f133 8 months ago

|

history blame contribute delete

2.01 kB

	---
	language: en
	license: apache-2.0
	library_name: sentence-transformers
	tags:
	- sentence-transformers
	- feature-extraction
	- sentence-similarity
	- transformers
	pipeline_tag: sentence-similarity
	---

	Distilbert encoder models trained on Movielens Ratings dataset (MovieLens-25M) using [DEXML](https://github.com/nilesh2797/DEXML) ([Dual Encoder for eXtreme Multi-Label classification, ICLR'24](https://arxiv.org/pdf/2310.10636v2.pdf)) method.

	## Inference Usage (Sentence-Transformers)
	With `sentence-transformers` installed you can use this model as following:
	```python
	from sentence_transformers import SentenceTransformer
	sentences = ["This is an example sentence", "Each sentence is converted"]
	model = SentenceTransformer('quicktensor/dexml_movielens-25m')
	embeddings = model.encode(sentences)
	print(embeddings)
	```

	## Usage (HuggingFace Transformers)
	With huggingface transformers you only need to be a bit careful with how you pool the transformer output to get the embedding, you can use this model as following;
	```python
	from transformers import AutoTokenizer, AutoModel
	import torch
	import torch.nn.functional as F

	pooler = lambda x: F.normalize(x[:, 0, :], dim=-1) # Choose CLS token and normalize

	sentences = ["This is an example sentence", "Each sentence is converted"]
	tokenizer = AutoTokenizer.from_pretrained('quicktensor/dexml_movielens-25m')
	model = AutoModel.from_pretrained('quicktensor/dexml_movielens-25m')

	encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
	with torch.no_grad():
	embeddings = pooler(model(**encoded_input))

	print(embeddings)
	```

	## Cite
	If you found this model helpful, please cite our work as:
	```bib
	@InProceedings{DEXML,
	author = "Gupta, N. and Khatri, D. and Rawat, A-S. and Bhojanapalli, S. and Jain, P. and Dhillon, I.",
	title = "Dual-encoders for Extreme Multi-label Classification",
	booktitle = "International Conference on Learning Representations",
	month = "May",
	year = "2024"
	}
	```