1720 159 66

Stefan Schweter PRO

stefan-it

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models

Recent Activity

upvoted a paper about 16 hours ago

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

upvoted a paper about 17 hours ago

UniBERTs: Adversarial Training for Language-Universal Representations

upvoted a paper about 18 hours ago

SuperBPE: Space Travel for Language Models

View all activity

Organizations

stefan-it's activity

upvoted a paper about 16 hours ago

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Paper • 2503.13427 • Published 1 day ago • 1

upvoted a paper about 17 hours ago

UniBERTs: Adversarial Training for Language-Universal Representations

Paper • 2503.12608 • Published 2 days ago • 1

upvoted a paper about 18 hours ago

SuperBPE: Space Travel for Language Models

Paper • 2503.13423 • Published 1 day ago • 1

liked a dataset 1 day ago

bbunzeck/babylm-german

Viewer • Updated about 11 hours ago • 1.88M • 11 • 1

upvoted a paper 1 day ago

Do Construction Distributions Shape Formal Language Learning In German BabyLMs?

Paper • 2503.11593 • Published 4 days ago • 1

reacted to clem's post with 🚀 4 days ago

Post

4379

We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!

3 replies

upvoted 2 papers 5 days ago

HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction

Paper • 2401.17948 • Published Jan 31, 2024 • 4

Transformers without Normalization

Paper • 2503.10622 • Published 5 days ago • 122

updated a model 6 days ago

flair/de-pos-fine-grained

Updated 6 days ago • 9

upvoted a paper 7 days ago

Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan

Paper • 2503.07827 • Published 8 days ago • 1

liked a dataset 7 days ago

Open-Orca/FLAN

Viewer • Updated Aug 2, 2023 • 378M • 13.5k • 176

commented a paper 8 days ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published 11 days ago • 72 •

commented a paper 9 days ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published 11 days ago • 72 •

New activity in stefan-it/neobert-ner-conll03 12 days ago

Update model.py

#1 opened 16 days ago by

KoichiYasuoka

New activity in dbmdz/electra-base-german-europeana-cased-discriminator 15 days ago

Adding `safetensors` variant of this model

#1 opened 15 days ago by

SFconvertbot

reacted to csabakecskemeti's post with 🔥 15 days ago

Post

1942

-UPDATED-
4bit inference is working! The blogpost is updated with code snippet and requirements.txt
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
-UPDATED-
I've played around with an MI100 and ROCm and collected my experience in a blogpost:
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
Unfortunately I've could not make inference or training work with model loaded in 8bit or use BnB, but did everything else and documented my findings.

4 replies

replied to their post 15 days ago

Let's see if BERT5urk can make it into @merve 's weekly recap of open AI 🤗

posted an update 15 days ago

Post

867

🇹🇷 😍 I'm very happy to finally announce my new Turkish LM called "BERT5urk":

stefan-it/bert5urk

It is a 1.42B T5-based model, trained with UL2 pretraining objective on the Turkish part of the awesome HuggingFaceFW/fineweb-2 dataset.

Feel free to check it out!

1 reply

New activity in stefan-it/bert5urk 15 days ago

readme: add initial version

#1 opened 15 days ago by

stefan-it

updated a model 15 days ago

stefan-it/bert5urk

Updated 15 days ago • 104 • 3