update model card.
Browse files
README.md
CHANGED
|
@@ -8,14 +8,14 @@ base_model:
|
|
| 8 |
|
| 9 |
- [GUI-Actor-7B-Qwen2-VL](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2-VL)
|
| 10 |
- [GUI-Actor-2B-Qwen2-VL](https://huggingface.co/microsoft/GUI-Actor-2B-Qwen2-VL)
|
| 11 |
-
- [GUI-Actor-7B-Qwen2.5-VL](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2.5-VL)
|
| 12 |
-
- [GUI-Actor-3B-Qwen2.5-VL](https://huggingface.co/microsoft/GUI-Actor-3B-Qwen2.5-VL)
|
| 13 |
- [GUI-Actor-Verifier-2B](https://huggingface.co/microsoft/GUI-Actor-Verifier-2B)
|
| 14 |
|
| 15 |
-
This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**
|
| 16 |
It is developed based on [Qwen2-VL-7B-Instruct ](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here (coming soon)]().
|
| 17 |
|
| 18 |
-
For more details on model design and evaluation, please check
|
| 19 |
|
| 20 |
## π Performance Comparison on GUI Grounding Benchmarks
|
| 21 |
Table 1. Main results on ScreenSpot-Pro, ScreenSpot, and ScreenSpot-v2 with **Qwen2-VL** as the backbone. β indicates scores obtained from our own evaluation of the official models on Huggingface.
|
|
@@ -118,7 +118,7 @@ print(f"Predicted click point: [{round(px, 4)}, {round(py, 4)}]")
|
|
| 118 |
# Predicted click point: [0.9709, 0.1548]
|
| 119 |
```
|
| 120 |
|
| 121 |
-
## Citation
|
| 122 |
```
|
| 123 |
@article{wu2025guiactor,
|
| 124 |
title={GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents},
|
|
|
|
| 8 |
|
| 9 |
- [GUI-Actor-7B-Qwen2-VL](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2-VL)
|
| 10 |
- [GUI-Actor-2B-Qwen2-VL](https://huggingface.co/microsoft/GUI-Actor-2B-Qwen2-VL)
|
| 11 |
+
- [GUI-Actor-7B-Qwen2.5-VL (coming soon)](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2.5-VL)
|
| 12 |
+
- [GUI-Actor-3B-Qwen2.5-VL (coming soon)](https://huggingface.co/microsoft/GUI-Actor-3B-Qwen2.5-VL)
|
| 13 |
- [GUI-Actor-Verifier-2B](https://huggingface.co/microsoft/GUI-Actor-Verifier-2B)
|
| 14 |
|
| 15 |
+
This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://aka.ms/GUI-Actor).
|
| 16 |
It is developed based on [Qwen2-VL-7B-Instruct ](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here (coming soon)]().
|
| 17 |
|
| 18 |
+
For more details on model design and evaluation, please check: [π Project Page](https://aka.ms/GUI-Actor) | [π» Github Repo](https://github.com/microsoft/GUI-Actor) | [π Paper]().
|
| 19 |
|
| 20 |
## π Performance Comparison on GUI Grounding Benchmarks
|
| 21 |
Table 1. Main results on ScreenSpot-Pro, ScreenSpot, and ScreenSpot-v2 with **Qwen2-VL** as the backbone. β indicates scores obtained from our own evaluation of the official models on Huggingface.
|
|
|
|
| 118 |
# Predicted click point: [0.9709, 0.1548]
|
| 119 |
```
|
| 120 |
|
| 121 |
+
## π Citation
|
| 122 |
```
|
| 123 |
@article{wu2025guiactor,
|
| 124 |
title={GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents},
|