Image Classification
PyTorch
torch
resnet
diagrams
computer-vision
File size: 4,207 Bytes
96ffc7e
e47b8cc
96ffc7e
 
 
 
 
 
 
 
dcef806
96ffc7e
 
 
 
 
 
 
 
 
e47b8cc
96ffc7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ff2677
9230778
96ffc7e
5ff2677
 
 
96ffc7e
 
 
 
 
 
 
 
 
 
 
 
 
5ff2677
 
 
96ffc7e
5ff2677
 
 
 
96ffc7e
5ff2677
 
96ffc7e
 
 
 
 
 
 
 
 
 
 
 
e47b8cc
96ffc7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22b8a16
 
 
 
 
96ffc7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
library_name: torch
tags:
- image-classification
- resnet
- diagrams
- pytorch
- computer-vision
license: apache-2.0
metrics:
- accuracy
- f1
- recall
- precision
base_model:
- microsoft/resnet-18
pipeline_tag: image-classification
datasets:
- phiyodr/coco2017
- HuggingFaceM4/ChartQA
- JasmineQiuqiu/diagrams_with_captions_2
---

# Model Card for Diagram Classification Model

## Model Details

### Model Description

This is a fine-tuned ResNet-18 model trained for binary image classification, distinguishing between **diagrams** and **non-diagrams**. The model is designed for use in applications that need automatic filtering or processing of diagram-based content.

- **Developed by:** Aya Mohamed
- **Model type:** ResNet-18 (Fine-tuned for image classification)
- **Language(s) (NLP):** Not applicable (Computer Vision model)
- **License:** Apache 2.0
- **Finetuned from model:** `microsoft/resnet-18`

### Model Sources

- **Repository:** [Ayamohamed/diaclass-model](https://huggingface.co/Ayamohamed/diaclass-model)

## Uses

### Direct Use

This model is intended for classifying images as **diagrams** or **non-diagrams**. It can be used in:
- **Document processing** (extracting diagrams from PDFs or scanned documents)
- **Chart-based visual question generation (VQG)**
- **Content moderation** (filtering diagram images from general image datasets)

### Out-of-Scope Use

- Not suitable for **multi-class classification** beyond diagrams vs. non-diagrams.
- Not designed for **hand-drawn sketches** or **complex figures with mixed elements**.

## Bias, Risks, and Limitations

- The model's accuracy depends on the training dataset, which may not cover all possible diagram styles.
- May misclassify **charts, blueprints, or artistic drawings** if they resemble diagrams.

### Recommendations

Users should **evaluate the model** on their specific dataset before deployment to ensure it performs well in their context.



## 🚀 How to Use

### **1️⃣ Load the Model from Hugging Face**
You can download the model and load it using `torch`.

```python
import torch
from huggingface_hub import hf_hub_download

# Download model from Hugging Face Hub
model_path = hf_hub_download(repo_id="Ayamohamed/DiaClassification", filename="model.pth")

# Load model
model_hg = torch.load(model_path)
model_hg.eval()  # Set to evaluation mode

```
### **2️⃣ Preprocess and Classify an Image**
```python
from PIL import Image
from torchvision import transforms

# Define Image Transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
def predict(image_path):
    image = Image.open(image_path).convert("RGB")  
    image = transform(image).unsqueeze(0) 
    with torch.no_grad():
        output = model_hg(image)
        class_idx = torch.argmax(output, dim=1).item()

    return "Diagram" if class_idx == 0 else "Not Diagram"

# Example usage
print(predict("my-diagram-classifier/31188_1536932698.jpg"))


```



## Training Details

### Training Data

The model was trained using:
- **ChartQA dataset** (for diagram samples)
- **JasmineQiuqiu/diagrams_with_captions_2** (for diagram samples)
- **COCO dataset (subset)** (for non-diagram samples)

### Training Procedure

- **Pretrained model:** `microsoft/resnet-18`
- **Optimization:** Adam optimizer
- **Loss function:** Cross-entropy loss
- **Training duration:** Approx. X hours on an NVIDIA GPU

## Evaluation

### Testing Data & Metrics

- **Dataset:** Held-out test set from ChartQA, AI2D-RST, and COCO
- **Metrics:**
  - **Test Loss:** 0.0371
  - **Test Accuracy:** 99.08%
  - **Precision:** 0.9995
  - **Recall:** 0.9820
  - **F1 Score:** 0.9907

## Environmental Impact

- **Hardware Used:** NVIDIA A100 GPU
- **Compute Hours:** Approx. X hours
- **Estimated Carbon Emission:** [Use MLCO2 Calculator](https://mlco2.github.io/impact#compute)

## Citation

If you use this model, please cite:

```bibtex
@misc{aya2025diaclass,
  author = {Aya Mohamed},
  title = {Diagram Classification Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Ayamohamed/diaclass-model}
}
```