Visual Question Answering
Safetensors
English

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

📑paper link

Model Card: DiagramAgent/Diagram_to_Code_Agent

1. Model Overview

  • Name: DiagramAgent/Diagram_to_Code_Agent
  • Description: This agent is tasked with converting a given diagram (visual representation) into its corresponding structured code.

2. Intended Use

  • Primary Tasks:
    • Convert existing diagrams into structured code representations.
    • Support diagram editing workflows by providing a reliable code basis for modifications.
    • Capture and preserve implicit logical structures and visual details of diagrams.
  • Application Scenarios:
    • Automated diagram editing: Transforming a diagram into code to enable subsequent modifications.
    • Reverse engineering of visual diagrams for analysis and reusability.
    • Enhancing data visualization tools by integrating code-based diagram representations.

3. Architecture and Training Details

  • Base Model: Utilizes the Qwen2-VL-7B model, which is a vision-language fusion model.
  • Training Process:
    • Trained on diverse diagram samples from the DiagramGenBenchmark dataset.
    • Aims to generate code that is highly consistent with a reference code, ensuring that all diagram elements are accurately captured.
    • Uses a specialized loss function to reduce the edit distance between the generated and reference code.
  • Module Interaction: Works closely with the Check Agent, which validates the generated code and provides feedback for further refinement.

4. Usage Examples

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

# default: Load the model on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "DiagramAgent/Diagram_to_Code_Agent", torch_dtype="auto", device_map="auto"
)

# default processer
processor = AutoProcessor.from_pretrained("DiagramAgent/Diagram_to_Code_Agent")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "your input",
            },
            {"type": "text", "text": "image path"},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=8192)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

5. Citation

If you find our work helpful, feel free to give us a cite.

@inproceedings{wei2024wordsstructuredvisualsbenchmark,
  title={From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing},
  author={Jingxuan Wei and Cheng Tan and Qi Chen and Gaowei Wu and Siyuan Li and Zhangyang Gao and Linzhuang Sun and Bihui Yu and Ruifeng Guo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}
Downloads last month
1
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for DiagramAgent/Diagram_to_Code_Agent

Base model

Qwen/Qwen2-VL-7B
Finetuned
(215)
this model

Dataset used to train DiagramAgent/Diagram_to_Code_Agent

Collection including DiagramAgent/Diagram_to_Code_Agent