caca9527 commited on
Commit
a69c5eb
·
verified ·
1 Parent(s): 45fa3a1

更新citation

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -7,4 +7,21 @@ license: apache-2.0
7
 
8
  <div align=center><img width="800" height="400" src="https://sh-code.mthreads.com/liang.yang/mt-gui/-/raw/master/assets/overview.png?ref_type=heads"/></div>
9
 
10
- 🔥🔥🔥 We have open-sourced our self-developed GUI multimodal visual understanding model GUIExplorer, which is based on the model architecture of LLaVA OneVision 7B. It has basic GUI visual understanding capabilities, including regional OCR, Grounding, and single-step instruction execution capabilities. For details on how to train and use this model, please refer to the [\[💻Code\]](https://sh-code.mthreads.com/liang.yang/mt-gui).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  <div align=center><img width="800" height="400" src="https://sh-code.mthreads.com/liang.yang/mt-gui/-/raw/master/assets/overview.png?ref_type=heads"/></div>
9
 
10
+ 🔥🔥🔥 We have open-sourced our self-developed GUI multimodal visual understanding model GUIExplorer, which is based on the model architecture of LLaVA OneVision 7B. It has basic GUI visual understanding capabilities, including regional OCR, Grounding, and single-step instruction execution capabilities. For details on how to train and use this model, please refer to the [\[💻Code\]](https://sh-code.mthreads.com/liang.yang/mt-gui).
11
+
12
+
13
+ **Citation**
14
+
15
+ If you use GUIExplorer for your research, please cite our [\[📝Paper\]](https://arxiv.org/abs/2503.11170):
16
+
17
+ ```bibtex
18
+ @misc{xu2025deskvisionlargescaledesktop,
19
+ title={DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents},
20
+ author={Yibin Xu and Liang Yang and Hao Chen and Hua Wang and Zhi Chen and Yaohua Tang},
21
+ year={2025},
22
+ eprint={2503.11170},
23
+ archivePrefix={arXiv},
24
+ primaryClass={cs.CL},
25
+ url={https://arxiv.org/abs/2503.11170},
26
+ }
27
+ ```