Vilém Zouhar's picture

Vilém Zouhar

zouharvi

·

https://vilda.net/

AI & ML interests

MT/NLG metrics, human evaluation, uncertainty

Recent Activity

updated a dataset 5 days ago

zouharvi/hearing2translate-humeval

new activity 8 days ago

zouharvi/hearing2translate-humeval:Centering Logo

published a dataset 8 days ago

zouharvi/hearing2translate-humeval

View all activity

Organizations

upvoted a paper 17 days ago

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

Paper • 2512.16378 • Published 19 days ago • 7

upvoted a paper 3 months ago

Estimating Machine Translation Difficulty

Paper • 2508.10175 • Published Aug 13, 2025 • 3

upvoted a collection 5 months ago

Translation Difficulty Estimators

This collection hosts the two Translation Difficulty estimators studied in https://arxiv.org/abs/2508.10175. • 3 items • Updated Sep 17, 2025 • 3

upvoted a paper 6 months ago

Can Large Language Models Capture Human Annotator Disagreements?

Paper • 2506.19467 • Published Jun 24, 2025 • 18

upvoted 2 collections 10 months ago

MT Sentinel Metrics

Machine Translation (MT) metrics designed explicitly to scrutinize the MT meta-evaluation process’s accuracy, robustness, and fairness. • 7 items • Updated Dec 4, 2024 • 7

✍️ QE4PE & GroTE

Materials for "QE4PE: Word-level Quality Estimation for Human Post-Editing" • 3 items • Updated Mar 6, 2025 • 1

upvoted a paper 10 months ago

QE4PE: Word-level Quality Estimation for Human Post-Editing

Paper • 2503.03044 • Published Mar 4, 2025 • 6

upvoted a collection 10 months ago

COMET-early-exit

Models introduced in the paper Early-Exit and Instant Confidence Translation Quality Estimation https://github.com/zouharvi/COMET-early-exit • 4 items • Updated Feb 21, 2025 • 2

upvoted 2 papers 10 months ago

We Can't Understand AI Using our Existing Vocabulary

Paper • 2502.07586 • Published Feb 11, 2025 • 10

Early-Exit and Instant Confidence Translation Quality Estimation

Paper • 2502.14429 • Published Feb 20, 2025 • 4

upvoted a collection 10 months ago

PreCOMET

COMET-like models for MT evaluation that predict some scores given only the source segment. https://github.com/zouharvi/subset2evaluate • 8 items • Updated Feb 25, 2025 • 2

upvoted a paper 11 months ago

How to Select Datapoints for Efficient Human Evaluation of NLG Models?

Paper • 2501.18251 • Published Jan 30, 2025 • 2