AI & ML interests

None defined yet.

Recent Activity

ShirinYamaniΒ  updated a dataset 11 days ago
trl-lib/documentation-images
qgallouedecΒ  updated a Space 16 days ago
trl-lib/recommend-vllm-memory
qgallouedecΒ  updated a Space 16 days ago
trl-lib/trackio
View all activity

sergiopaniegoΒ 
posted an update 6 days ago
view post
Post
340
It's now posible to do end-2-end ML without leaving the @huggingface Hub, by combining TRL + HF jobs + Trackio!!

🐑We just released a full guide explaining the process.

Go check it out!

πŸ“– Guide: https://huggingface.co/docs/trl/main/en/jobs_training

πŸ’‘ Reminder: HF Jobs is only available for Pro, Team, or Enterprise plans. Yet another reason to upgrade
sergiopaniegoΒ 
posted an update 21 days ago
sergiopaniegoΒ 
posted an update 22 days ago
view post
Post
391
New Zero-Shot Object Detectors in transformers! πŸ₯½

We’ve added LLMDet and MM GroundingDINO, plus a demo Space to compare them with others πŸ–ΌοΈ

Play with it: ariG23498/zero-shot-od
sergiopaniegoΒ 
posted an update 22 days ago
sergiopaniegoΒ 
posted an update 26 days ago
view post
Post
442
Latest TRL release brings major upgrades for multimodal alignment!

We dive into 3 new techniques to improve VLM post-training in our new blog:

πŸŒ‹ GRPO
🎞️ GSPO
πŸ™ MPO
βž• vLLM integration for online training w/ transformers backend\

🐑 Blog: https://huggingface.co/blog/trl-vlm-alignment
sergiopaniegoΒ 
posted an update 28 days ago
sergiopaniegoΒ 
posted an update 29 days ago
view post
Post
3404
Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? πŸŒ‹

πŸ§‘β€πŸ³ We've got you covered!!

NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.

Go to the recipe πŸ‘‰https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl

Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images πŸŒ‹
sergiopaniegoΒ 
posted an update 29 days ago
view post
Post
4500
Just included example scripts for aligning models using GSPO (including VLM example) πŸ™†β€β™‚οΈπŸ™†β€β™‚οΈ

GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.

Super-easy-to-get-started example scripts below, GO run them!πŸ‘©β€πŸ’»πŸ‘©β€πŸ’»

πŸ§‘β€πŸŽ¨ Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
πŸ¦„ VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
🧩 More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
πŸ§™β€β™‚οΈ GSPO paper: Group Sequence Policy Optimization (2507.18071)
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
342
Did you miss this? πŸ‘“

πŸ§™β€β™‚οΈvLLM + transformers integration just got upgraded with direct VLM support.

Select a VLM + model_impl=transformers and play via vLLM!
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
2664
We just released TRL v0.20 with major multimodal upgrades!

πŸ‘οΈ VLM support for GRPO (highly requested by the community!)
🎞️ New GSPO trainer (from @Qwen , released last week, VLM-ready)
πŸ™ New MPO trainer (multimodal by design, as in the paper)

πŸ“ Full release notes here: https://github.com/huggingface/trl/releases/tag/v0.20.0
qgallouedecΒ 
published a model about 1 month ago
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
1203
Yet Another New Multimodal Fine-Tuning Recipe πŸ₯§

πŸ§‘β€πŸ³ In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.

πŸ’‘ This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!

We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).

Check it out! ➑️ https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo
  • 2 replies
Β·