Nandan Thakur
nthakur
AI & ML interests
NLP, IR, QA
Recent Activity
reacted
to
their
post
with 🔥
2 days ago
Last year, I curated & generated a few multilingual SFT and DPO datasets by translating English SFT/DPO datasets into 9-10 languages using the https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 model.
I hope it helps the community for pretraining/instruction tuning multilingual LLMs! I added a small diagram to briefly describe which datasets are added and their sources.
Happy to collaborate in either using these datasets for instruction FT, or wishes to extend translated versions of newer SFT/DPO english datasets!
https://huggingface.co/collections/nthakur/multilingual-sft-and-dpo-datasets-67eaf56fe3feca5a57cf7d74
updated
a collection
2 days ago
🏜️MIRAGE-Bench [NAACL'25]
Organizations
nthakur's activity
Is the training split available?
1
#3 opened 3 months ago
by
nthakur

What is the difference between 'facebook/mcontriever-msmarco' and yours ?
2
#1 opened 10 months ago
by
cramraj8
Dataset Viewer issue: DatasetWithScriptNotSupportedError
1
#1 opened over 1 year ago
by
nthakur
