File size: 4,207 Bytes
96ffc7e e47b8cc 96ffc7e dcef806 96ffc7e e47b8cc 96ffc7e 5ff2677 9230778 96ffc7e 5ff2677 96ffc7e 5ff2677 96ffc7e 5ff2677 96ffc7e 5ff2677 96ffc7e e47b8cc 96ffc7e 22b8a16 96ffc7e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
---
library_name: torch
tags:
- image-classification
- resnet
- diagrams
- pytorch
- computer-vision
license: apache-2.0
metrics:
- accuracy
- f1
- recall
- precision
base_model:
- microsoft/resnet-18
pipeline_tag: image-classification
datasets:
- phiyodr/coco2017
- HuggingFaceM4/ChartQA
- JasmineQiuqiu/diagrams_with_captions_2
---
# Model Card for Diagram Classification Model
## Model Details
### Model Description
This is a fine-tuned ResNet-18 model trained for binary image classification, distinguishing between **diagrams** and **non-diagrams**. The model is designed for use in applications that need automatic filtering or processing of diagram-based content.
- **Developed by:** Aya Mohamed
- **Model type:** ResNet-18 (Fine-tuned for image classification)
- **Language(s) (NLP):** Not applicable (Computer Vision model)
- **License:** Apache 2.0
- **Finetuned from model:** `microsoft/resnet-18`
### Model Sources
- **Repository:** [Ayamohamed/diaclass-model](https://huggingface.co/Ayamohamed/diaclass-model)
## Uses
### Direct Use
This model is intended for classifying images as **diagrams** or **non-diagrams**. It can be used in:
- **Document processing** (extracting diagrams from PDFs or scanned documents)
- **Chart-based visual question generation (VQG)**
- **Content moderation** (filtering diagram images from general image datasets)
### Out-of-Scope Use
- Not suitable for **multi-class classification** beyond diagrams vs. non-diagrams.
- Not designed for **hand-drawn sketches** or **complex figures with mixed elements**.
## Bias, Risks, and Limitations
- The model's accuracy depends on the training dataset, which may not cover all possible diagram styles.
- May misclassify **charts, blueprints, or artistic drawings** if they resemble diagrams.
### Recommendations
Users should **evaluate the model** on their specific dataset before deployment to ensure it performs well in their context.
## 🚀 How to Use
### **1️⃣ Load the Model from Hugging Face**
You can download the model and load it using `torch`.
```python
import torch
from huggingface_hub import hf_hub_download
# Download model from Hugging Face Hub
model_path = hf_hub_download(repo_id="Ayamohamed/DiaClassification", filename="model.pth")
# Load model
model_hg = torch.load(model_path)
model_hg.eval() # Set to evaluation mode
```
### **2️⃣ Preprocess and Classify an Image**
```python
from PIL import Image
from torchvision import transforms
# Define Image Transformations
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
def predict(image_path):
image = Image.open(image_path).convert("RGB")
image = transform(image).unsqueeze(0)
with torch.no_grad():
output = model_hg(image)
class_idx = torch.argmax(output, dim=1).item()
return "Diagram" if class_idx == 0 else "Not Diagram"
# Example usage
print(predict("my-diagram-classifier/31188_1536932698.jpg"))
```
## Training Details
### Training Data
The model was trained using:
- **ChartQA dataset** (for diagram samples)
- **JasmineQiuqiu/diagrams_with_captions_2** (for diagram samples)
- **COCO dataset (subset)** (for non-diagram samples)
### Training Procedure
- **Pretrained model:** `microsoft/resnet-18`
- **Optimization:** Adam optimizer
- **Loss function:** Cross-entropy loss
- **Training duration:** Approx. X hours on an NVIDIA GPU
## Evaluation
### Testing Data & Metrics
- **Dataset:** Held-out test set from ChartQA, AI2D-RST, and COCO
- **Metrics:**
- **Test Loss:** 0.0371
- **Test Accuracy:** 99.08%
- **Precision:** 0.9995
- **Recall:** 0.9820
- **F1 Score:** 0.9907
## Environmental Impact
- **Hardware Used:** NVIDIA A100 GPU
- **Compute Hours:** Approx. X hours
- **Estimated Carbon Emission:** [Use MLCO2 Calculator](https://mlco2.github.io/impact#compute)
## Citation
If you use this model, please cite:
```bibtex
@misc{aya2025diaclass,
author = {Aya Mohamed},
title = {Diagram Classification Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Ayamohamed/diaclass-model}
}
``` |