Morae: Proactively Pausing UI Agents for User Choices
Abstract
Morae, a UI agent, enhances accessibility for BLV users by involving them in decision-making processes during task execution, using large multimodal models to interpret user queries and UI elements.
User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.
Community
Morae introduces a multimodal UI agent that proactively pauses for user choices, using prompts to clarify options and improve user agency for blind and low-vision users.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Task Mode: Dynamic Filtering for Task-Specific Web Navigation using LLMs (2025)
- Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement (2025)
- UserBench: An Interactive Gym Environment for User-Centric Agents (2025)
- Magentic-UI: Towards Human-in-the-loop Agentic Systems (2025)
- Machine-Readable Ads: Accessibility and Trust Patterns for AI Web Agents interacting with Online Advertisements (2025)
- Talking-to-Build: How LLM-Assisted Interface Shapes Player Performance and Experience in Minecraft (2025)
- Generative Interfaces for Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper