File size: 6,605 Bytes
1b5ec80
 
 
 
 
 
 
c952138
 
1b5ec80
 
 
c952138
 
 
 
 
 
 
 
 
 
 
 
 
4bee296
c952138
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8ec216
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c952138
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8ec216
c952138
e8ec216
c952138
 
 
 
 
 
 
 
3519c84
c952138
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
license: mit
language:
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model:
- microsoft/swin-large-patch4-window7-224
pipeline_tag: image-classification
---
# Model Card for ChartDet

## Model Details

### Model Description

**ChartDet** is an implementation of the Swin Transformer block used by the ChartEye model, adapted for chart classification tasks. While the ChartEye paper focuses on identifying specific chart types, this model is trained to distinguish between charts and non-chart images.

- **Developed by:** Stefano D’Angelo
- **Model type:** Image Classification (Swin Transformer)
- **Language(s) (NLP):** Not applicable
- **License:** MIT
- **Finetuned from model :** [microsoft/swin-large-patch4-window7-224](https://doi.org/10.48550/arXiv.2103.14030).

### Model Sources 

- **Repository:** [ChartDet GitHub Repository](https://github.com/stefanodangelo/ChartDet)
- **Paper :** [ChartEye Paper](https://arxiv.org/abs/2408.16123)

## Uses

### Direct Use

This model can be used to classify images into chart and non-chart categories directly.

### Downstream Use 

The model can be fine-tuned further for specific chart type classification tasks or integrated into applications for automated document analysis.

### Out-of-Scope Use

The model is not designed for identifying specific chart types (e.g., bar, line, pie) or for tasks outside chart detection.

### Bias, Risks, and Limitations

The model was trained on datasets like ICPR2022 CHARTINFO, PACS, and DomainNet, which may not fully represent all chart types or images in real-world scenarios. Potential biases include:

- **Dataset Bias:** The training datasets may underrepresent certain chart styles or image types, impacting model generalization.
- **Domain Limitations:** Performance may degrade on charts or images from unseen domains or with significant visual noise.
- **Misclassification Risk:** Non-chart images with chart-like features (e.g., diagrams) may occasionally be misclassified.

Users should carefully evaluate the model on their specific data to ensure compatibility and adjust as needed.

### Recommendations

- Users should ensure that input data matches the training domain to achieve optimal performance.
- Avoid using the model for unrelated image classification tasks without fine-tuning.

## How to Get Started with the Model

Use the following code snippet to get started:

```python
from huggingface_hub import snapshot_download
from transformers import AutoImageProcessor, AutoConfig, SwinForImageClassification
from torchvision import transforms
import torch

# Download the model
snapshot_download(repo_id='stefanodangelo/chartdet', local_dir='./', allow_patterns='*.pt')

# Load the model and processor
processor = AutoImageProcessor.from_pretrained("microsoft/swin-large-patch4-window7-224")
config = AutoConfig.from_pretrained("microsoft/swin-large-patch4-window7-224", num_labels=2)
model = SwinForImageClassification(config)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.load_state_dict(torch.load("models/chart_detection_model.pt", map_location=torch.device(device)))

# Define transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=processor.image_mean, std=processor.image_std)
])

id2class = {0: "Picture", 1: "Chart"}

# Prepare an image
from PIL import Image
image = Image.open("YOUR_IMG_PATH")
image_tensor = transform(image).unsqueeze(0).to(device)  # Add batch dimension and move to device

# Make prediction
with torch.no_grad():
    outputs = model(image_tensor).logits  # Get logits from the model
    predicted_class = torch.argmax(outputs, dim=1).item()  # Get the class index

print(f"Predicted class: {id2class[predicted_class]}")
```

## Training Details

### Training Data

The model was trained on a combination of:

- [ICPR2022 CHARTINFO UB PMC competition dataset](https://www.kaggle.com/datasets/pranithchowdary/icpr-2022?resource=download-directory)
- [PACS dataset](https://paperswithcode.com/dataset/pacs)
- [DomainNet dataset](https://paperswithcode.com/dataset/domainnet)

### Training Procedure

The model was fine-tuned using the following setup:

#### Preprocessing

Images were preprocessed to match the input requirements of the Swin Transformer model (e.g., resizing, normalization).

#### Training Hyperparameters

- **Optimizer:** Adam
- **Loss Function:** CrossEntropyLoss
- **Batch Size:** 8
- **Epochs:** 12
- **Learning Rate:** 3e-6
- **Seed:** 42

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

Evaluation used subsets of the datasets mentioned above, with metrics computed on held-out validation or test splits on an 80-20 split strategy.

#### Factors

Performance was assessed across images from diverse domains (e.g., charts vs. natural images).

#### Metrics

Evaluation metrics included:

- Accuracy
- Confusion Matrix
- Classification Report (e.g., Precision, Recall, F1-Score)

### Results

Results indicate effective performance in distinguishing between charts and non-chart images. Quantitative results are as follows:
- Accuracy: 99.89%
- Precision (Weighted): 99.80%
- Recall (Weighted): 99.93%
- F1-Score (Weighted): 99.87%

#### Summary

The model achieves reliable classification for the intended task within the training domain.

## Environmental Impact

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** NVIDIA GeForce RTX 4070 Ti
- **Hours used:** ~2.47 hours (total training time: 12 epochs x ~740 seconds per epoch = ~8880 seconds)
- **Cloud Provider:** Local (no cloud provider used)
- **Compute Region:** Not applicable (local machine)
- **Carbon Emitted:** Not computed (local compute environment without data on power source or emissions factor)

## Technical Specifications 

### Model Architecture and Objective

The model uses a Swin Transformer-based architecture adapted for binary image classification.

### Compute Infrastructure

#### Hardware

- NVIDIA GeForce RTX 4070 Ti

#### Software

- Windows 11
- Python 3.11
- CUDA 11.7
- HuggingFace Transformers Library
- PyTorch & TorchVision

## Citation 

**BibTeX:**

```bibtex
@misc{chartdet2025,
  author = {Stefano D’Angelo},
  title = {ChartDet: Finetuned Swin Transformer for Chart Classification},
  year = {2025},
  howpublished = {\url{https://huggingface.co/stefanodangelo/chartdet}}
}
```

## Model Card Authors 

Stefano D’Angelo