File size: 4,412 Bytes
af65138 4aed93e af65138 4aed93e af65138 4aed93e af65138 4aed93e af65138 4aed93e af65138 2e5dcd3 4aed93e af65138 4aed93e af65138 4aed93e 2e5dcd3 4aed93e 2e5dcd3 af65138 4aed93e 2e5dcd3 47d7b97 4aed93e 47d7b97 af65138 4aed93e 6f4f7d0 4aed93e 47d7b97 4aed93e 47d7b97 4aed93e 47d7b97 af65138 4aed93e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
license: apache-2.0
tags:
- image-classification
- computer-vision
- checkbox-detection
- efficientnet
datasets:
- wendys-llc/chkbx
metrics:
- accuracy
- f1
- precision
- recall
base_model: google/efficientnet-b0
model-index:
- name: checkbox-classifier-efficientnet
results:
- task:
type: image-classification
name: Image Classification
dataset:
type: wendys-llc/chkbx
name: Checkbox Detection Dataset
split: validation
metrics:
- type: accuracy
value: 0.97
name: Validation Accuracy
library_name: transformers
pipeline_tag: image-classification
---
# Checkbox State Classifier - EfficientNet-B0
A fine-tuned EfficientNet-B0 model for binary classification of checkbox states (checked/unchecked). This model achieves ~95% accuracy on UI checkbox detection.
## Model Description
This model is fine-tuned from [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0) on the [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx) dataset. It's designed to classify UI checkboxes in screenshots and interface images.
### Key Features
- **No `trust_remote_code` required** - Uses native transformers support
- **Fast inference** - EfficientNet-B0 is optimized for speed
- **High accuracy** - ~95% on validation set
- **Simple API** - Works with transformers pipeline out of the box
## Usage
### Quick Start with Pipeline (Recommended)
```python
from transformers import pipeline
from PIL import Image
# Load the model
classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet")
# Classify an image
image = Image.open("checkbox.jpg")
results = classifier(image)
# Print results
for result in results:
print(f"{result['label']}: {result['score']:.2%}")
# Get just the top prediction
top_result = classifier(image, top_k=1)[0]
print(f"Checkbox is: {top_result['label']} (confidence: {top_result['score']:.2%})")
```
### Using AutoModel and AutoImageProcessor
```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
import torch
from PIL import Image
# Load model and processor
processor = AutoImageProcessor.from_pretrained("wendys-llc/checkbox-classifier-efficientnet")
model = AutoModelForImageClassification.from_pretrained("wendys-llc/checkbox-classifier-efficientnet")
# Prepare image
image = Image.open("checkbox.jpg")
inputs = processor(images=image, return_tensors="pt")
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Get predicted class
predicted_class_idx = logits.argmax(-1).item()
predicted_label = model.config.id2label[predicted_class_idx]
# Get confidence scores
probabilities = torch.nn.functional.softmax(logits, dim=-1)
confidence = probabilities.max().item()
print(f"Prediction: {predicted_label} (confidence: {confidence:.2%})")
```
### Batch Processing
```python
from transformers import pipeline
from PIL import Image
classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet")
# Process multiple images
images = [Image.open(f"checkbox_{i}.jpg") for i in range(1, 4)]
results = classifier(images)
for i, result in enumerate(results):
top_pred = result[0] # Get top prediction
print(f"Image {i+1}: {top_pred['label']} ({top_pred['score']:.2%})")
```
## Model Details
### Architecture
- **Base Model**: google/efficientnet-b0
- **Model Type**: EfficientNet for Image Classification
- **Number of Labels**: 2 (checked, unchecked)
- **Input Size**: 224x224 RGB images
- **Framework**: PyTorch via Transformers
### Training Details
- **Dataset**: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx)
- ~4,800 training samples
- ~1,200 validation samples
- **Training Configuration**:
- Epochs: 15 (with early stopping)
- Batch Size: 64 (on A100)
- Learning Rate: Default AdamW
- Mixed Precision: FP16
- Hardware: NVIDIA A100 GPU
## Acknowledgments
- Base model: [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0)
- Dataset: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx)
- Framework: [HuggingFace Transformers](https://github.com/huggingface/transformers)
## License
This model is licensed under the Apache 2.0 License. See the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details. |