968 202 692

Tom Aarsen

tomaarsen

https://linkedin.com/in/tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

new activity about 4 hours ago

jxm/cde-small-v2:Clean up README slightly

new activity about 6 hours ago

Alibaba-NLP/gte-modernbert-base:Entering on MTEB

new activity about 6 hours ago

Alibaba-NLP/gte-modernbert-base:NaN values when input is longer than context window?

View all activity

Articles

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

Mar 22, 2024

• 70

🪆 Introduction to Matryoshka Embedding Models

Feb 23, 2024

• 73

SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit

Dec 6, 2023

• 6

🕳️ Attention Sinks in LLMs for endless fluency

Oct 9, 2023

• 7

Organizations

tomaarsen's activity

New activity in jxm/cde-small-v2 about 4 hours ago

Clean up README slightly

#7 opened 7 days ago by

tomaarsen

New activity in Alibaba-NLP/gte-modernbert-base about 6 hours ago

Entering on MTEB

#12 opened about 6 hours ago by

tomaarsen

NaN values when input is longer than context window?

#11 opened about 7 hours ago by

AHuguet

liked a model about 9 hours ago

mobiuslabsgmbh/DeepSeek-R1-ReDistill-Llama3-8B-v1.1

Text Generation • Updated about 15 hours ago • 5

liked a model about 16 hours ago

flozi00/GermanEduScorer-ModernBERT-base

Text Classification • Updated 1 day ago • 94 • 2

liked 2 models about 18 hours ago

minishlab/potion-retrieval-32M

Updated 1 day ago • 22 • 9

minishlab/potion-base-32M

Updated 1 day ago • 42 • 5

upvoted 2 articles about 18 hours ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

•

1 day ago

• 19

Article

State of open video generation models in Diffusers

4 days ago

• 23

liked a dataset 1 day ago

lightonai/ms-marco-en-bge

Viewer • Updated Aug 26, 2024 • 10.5M • 193 • 6

upvoted 2 articles 1 day ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

3 days ago

• 461

Article

🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker!

•

2 days ago

• 13

upvoted a paper 2 days ago

SPLADE-v3: New baselines for SPLADE

Paper • 2403.06789 • Published Mar 11, 2024 • 2

New activity in Salesforce/SFR-Embedding-Code-2B_R 2 days ago

Add Sentence Transformers integration

#7 opened 10 days ago by

tomaarsen

updated a model 2 days ago

Salesforce/SFR-Embedding-Code-2B_R

Feature Extraction • Updated 2 days ago • 1.44k • 24

liked a Space 2 days ago

Configuration error

😻

Like History

upvoted an article 2 days ago

Article

Welcome to Inference Providers on the Hub 🔥

3 days ago

• 171

New activity in tomaarsen/gooaq-embeddings 3 days ago

Librarian Bot: Add language metadata for dataset

#2 opened 3 days ago by

librarian-bot

updated a dataset 5 days ago

tomaarsen/gooaq-embeddings

Viewer • Updated 3 days ago • 6.02M • 25

published a dataset 5 days ago

tomaarsen/gooaq-embeddings

Viewer • Updated 3 days ago • 6.02M • 25

Tom Aarsen

AI & ML interests

Recent Activity

Articles

Train 400x faster Static Embedding Models with Sentence Transformers

Finally, a Replacement for BERT: Introducing ModernBERT

Welcome Gemma 2 - Google's new open LLM

Training and Finetuning Embedding Models with Sentence Transformers v3

Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

🪆 Introduction to Matryoshka Embedding Models

SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit

🕳️ Attention Sinks in LLMs for endless fluency

Organizations

tomaarsen's activity

Clean up README slightly

Entering on MTEB

NaN values when input is longer than context window?

KV Caching Explained: Optimizing Transformer Inference Efficiency

State of open video generation models in Diffusers

Open-R1: a fully open reproduction of DeepSeek-R1

🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker!

Add Sentence Transformers integration

Like History

Welcome to Inference Providers on the Hub 🔥

Librarian Bot: Add language metadata for dataset