Jack Cloudman

JackCloudman

AI & ML interests

None yet

Recent Activity

updated a model about 4 hours ago
JackCloudman/openhands-lm-32b-v0.1-jackterated
published a model about 5 hours ago
JackCloudman/openhands-lm-32b-v0.1-jackterated
liked a model about 14 hours ago
all-hands/openhands-lm-32b-v0.1
View all activity

Organizations

Hugging Face Discord Community's profile picture

JackCloudman's activity

reacted to m-ric's post with ❀️ about 19 hours ago
view post
Post
1613
πŸš€ DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. πŸ“š

πŸ‘‰ But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
🎯 Action type accuracy: Does the predicted action match the ground truth?
πŸ“ Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
πŸ“‘ Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasksβ€”compared to 76,000 tasks for larger models like OS-Atlasβ€”UI-R1 shows significant efficiency and improved performance:
πŸ“ˆ Boosted action prediction accuracy from 76% to 89% on AndroidControl.
🌐 Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
πŸ” Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? 🧐

Read the full paper here πŸ‘‰ UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)