Spaces:

Sanket17
/

NewParser

Runtime error

App Files Files Community

Sanket17 commited on Dec 31, 2024

Commit

7e4f6b4

verified ·

1 Parent(s): 41e8ff3

Update README.md

Browse files

Files changed (1) hide show

README.md +37 -53

README.md CHANGED Viewed

@@ -1,53 +1,37 @@
-# OmniParser API
-Self-hosted version of Microsoft's [OmniParser](https://huggingface.co/microsoft/OmniParser) Image-to-text model.
-> OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent. Training Datasets include: 1) an interactable icon detection dataset, which was curated from popular web pages and automatically annotated to highlight clickable and actionable regions, and 2) an icon description dataset, designed to associate each UI element with its corresponding function.
-## Why?
-There's already a great HuggingFace gradio [app](https://huggingface.co/spaces/microsoft/OmniParser) for this model. It even offers an API. But
-- Gradio is much slower than serving the model directly (like we do here)
-- HF is rate-limited
-## How it works
-If you look at the Dockerfile, we start off with the HF demo image to retrive all the weights and util functions. Then we add a simple FastAPI server (under main.py) to serve the model.
-## Getting Started
-### Requirements
-- GPU
-- 16 GB Ram (swap recommended)
-### Locally
-1. Clone the repository
-2. Build the docker image: `docker build -t omni-parser-app .`
-3. Run the docker container: `docker run -p 7860:7860 omni-parser-app`
-### Self-hosted API
-I suggest hosting on [fly.io](https://fly.io) because it's quick and simple to deploy with a CLI.
-This repo is ready-made for deployment on fly.io (see fly.toml for configuration). Just run `fly launch` and follow the prompts.
-## Docs
-Visit `http://localhost:7860/docs` for the API documentation. There's only one route `/process_image` which returns
-- The image with bounding boxes drawn on (in base64) format
-- The parsed elements in a list with text descriptions
-- The bounding box coordinates of the parsed elements
-## Examples
-| Before Image                       | After Image                   |
-| ---------------------------------- | ----------------------------- |
-| ![Before](examples/screenshot.png) | ![After](examples/after.webp) |
-## Related Projects
-Check out [OneQuery](https://query-rho.vercel.app), an agent that browses the web and returns structured responses for any query, simple or complex. OneQuery is built using OmniParser to enhance its capabilities.

+---
+license: mit
+title: Omniparser-api
+sdk: docker
+emoji: 😻
+colorFrom: red
+colorTo: yellow
+---
+# Omniparser API
+The Omniparser API is a versatile and efficient tool designed to parse, process, and analyze various types of documents or datasets using machine learning models.
+## Features
+- Upload and process documents (e.g., images, PDFs).
+- Detect objects, text, or patterns within uploaded files.
+- Analyze and parse structured or unstructured content.
+- Highly configurable thresholds for precision and flexibility.
+## How to Use
+1. **Upload a Document**: Send a file (e.g., an image or PDF) via the `/process/` endpoint.
+2. **Adjust Thresholds**: Configure `box_threshold` and `iou_threshold` for desired accuracy.
+3. **Receive Results**: Get a JSON response with parsed content and processed outputs.
+## Endpoints
+- **`GET /`**: Welcome page for the API.
+- **`POST /process/`**: Upload and process a document with configurable thresholds.
+## Installation
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/your-username/omniparser-api.git
+   cd omniparser-api