File size: 4,412 Bytes
af65138
 
 
 
4aed93e
 
 
af65138
 
4aed93e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af65138
 
4aed93e
af65138
4aed93e
af65138
4aed93e
 
 
 
 
 
 
 
 
 
 
 
 
af65138
 
2e5dcd3
4aed93e
af65138
4aed93e
 
af65138
4aed93e
2e5dcd3
4aed93e
 
 
 
 
 
 
 
 
2e5dcd3
af65138
4aed93e
2e5dcd3
47d7b97
4aed93e
47d7b97
 
af65138
4aed93e
 
 
6f4f7d0
4aed93e
47d7b97
 
 
4aed93e
47d7b97
 
 
 
4aed93e
 
 
 
 
 
 
 
 
47d7b97
af65138
4aed93e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
license: apache-2.0
tags:
- image-classification
- computer-vision
- checkbox-detection
- efficientnet
datasets:
- wendys-llc/chkbx
metrics:
- accuracy
- f1
- precision
- recall
base_model: google/efficientnet-b0
model-index:
- name: checkbox-classifier-efficientnet
  results:
  - task:
      type: image-classification
      name: Image Classification
    dataset:
      type: wendys-llc/chkbx
      name: Checkbox Detection Dataset
      split: validation
    metrics:
    - type: accuracy
      value: 0.97
      name: Validation Accuracy
library_name: transformers
pipeline_tag: image-classification
---

# Checkbox State Classifier - EfficientNet-B0

A fine-tuned EfficientNet-B0 model for binary classification of checkbox states (checked/unchecked). This model achieves ~95% accuracy on UI checkbox detection.

## Model Description

This model is fine-tuned from [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0) on the [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx) dataset. It's designed to classify UI checkboxes in screenshots and interface images.

### Key Features
- **No `trust_remote_code` required** - Uses native transformers support
- **Fast inference** - EfficientNet-B0 is optimized for speed
- **High accuracy** - ~95% on validation set
- **Simple API** - Works with transformers pipeline out of the box

## Usage

### Quick Start with Pipeline (Recommended)

```python
from transformers import pipeline
from PIL import Image

# Load the model
classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet")

# Classify an image
image = Image.open("checkbox.jpg")
results = classifier(image)

# Print results
for result in results:
    print(f"{result['label']}: {result['score']:.2%}")

# Get just the top prediction
top_result = classifier(image, top_k=1)[0]
print(f"Checkbox is: {top_result['label']} (confidence: {top_result['score']:.2%})")
```

### Using AutoModel and AutoImageProcessor

```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
import torch
from PIL import Image

# Load model and processor
processor = AutoImageProcessor.from_pretrained("wendys-llc/checkbox-classifier-efficientnet")
model = AutoModelForImageClassification.from_pretrained("wendys-llc/checkbox-classifier-efficientnet")

# Prepare image
image = Image.open("checkbox.jpg")
inputs = processor(images=image, return_tensors="pt")

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    
    # Get predicted class
    predicted_class_idx = logits.argmax(-1).item()
    predicted_label = model.config.id2label[predicted_class_idx]
    
    # Get confidence scores
    probabilities = torch.nn.functional.softmax(logits, dim=-1)
    confidence = probabilities.max().item()

print(f"Prediction: {predicted_label} (confidence: {confidence:.2%})")
```

### Batch Processing

```python
from transformers import pipeline
from PIL import Image

classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet")

# Process multiple images
images = [Image.open(f"checkbox_{i}.jpg") for i in range(1, 4)]
results = classifier(images)

for i, result in enumerate(results):
    top_pred = result[0]  # Get top prediction
    print(f"Image {i+1}: {top_pred['label']} ({top_pred['score']:.2%})")
```

## Model Details

### Architecture
- **Base Model**: google/efficientnet-b0
- **Model Type**: EfficientNet for Image Classification
- **Number of Labels**: 2 (checked, unchecked)
- **Input Size**: 224x224 RGB images
- **Framework**: PyTorch via Transformers

### Training Details
- **Dataset**: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx)
  - ~4,800 training samples
  - ~1,200 validation samples
- **Training Configuration**:
  - Epochs: 15 (with early stopping)
  - Batch Size: 64 (on A100)
  - Learning Rate: Default AdamW
  - Mixed Precision: FP16
  - Hardware: NVIDIA A100 GPU

## Acknowledgments

- Base model: [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0)
- Dataset: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx)
- Framework: [HuggingFace Transformers](https://github.com/huggingface/transformers)

## License

This model is licensed under the Apache 2.0 License. See the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.