File size: 4,412 Bytes

af65138
 
 
 
4aed93e
 
 
af65138
 
4aed93e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af65138
 
4aed93e
af65138
4aed93e
af65138
4aed93e
 
 
 
 
 
 
 
 
 
 
 
 
af65138
 
2e5dcd3
4aed93e
af65138
4aed93e
 
af65138
4aed93e
2e5dcd3
4aed93e
 
 
 
 
 
 
 
 
2e5dcd3
af65138
4aed93e
2e5dcd3
47d7b97
4aed93e
47d7b97
 
af65138
4aed93e
 
 
6f4f7d0
4aed93e
47d7b97
 
 
4aed93e
47d7b97
 
 
 
4aed93e
 
 
 
 
 
 
 
 
47d7b97
af65138
4aed93e

---
license: apache-2.0
tags:
- image-classification
- computer-vision
- checkbox-detection
- efficientnet
datasets:
- wendys-llc/chkbx
metrics:
- accuracy
- f1
- precision
- recall
base_model: google/efficientnet-b0
model-index:
- name: checkbox-classifier-efficientnet
  results:
  - task:
      type: image-classification
      name: Image Classification
    dataset:
      type: wendys-llc/chkbx
      name: Checkbox Detection Dataset
      split: validation
    metrics:
    - type: accuracy
      value: 0.97
      name: Validation Accuracy
library_name: transformers
pipeline_tag: image-classification
---

# Checkbox State Classifier - EfficientNet-B0

A fine-tuned EfficientNet-B0 model for binary classification of checkbox states (checked/unchecked). This model achieves ~95% accuracy on UI checkbox detection.

## Model Description

This model is fine-tuned from [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0) on the [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx) dataset. It's designed to classify UI checkboxes in screenshots and interface images.

### Key Features
- **No `trust_remote_code` required** - Uses native transformers support
- **Fast inference** - EfficientNet-B0 is optimized for speed
- **High accuracy** - ~95% on validation set
- **Simple API** - Works with transformers pipeline out of the box

## Usage

### Quick Start with Pipeline (Recommended)

```python
from transformers import pipeline
from PIL import Image

# Load the model
classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet")

# Classify an image
image = Image.open("checkbox.jpg")
results = classifier(image)

# Print results
for result in results:
    print(f"{result['label']}: {result['score']:.2%}")

# Get just the top prediction
top_result = classifier(image, top_k=1)[0]
print(f"Checkbox is: {top_result['label']} (confidence: {top_result['score']:.2%})")
```

### Using AutoModel and AutoImageProcessor

```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
import torch
from PIL import Image

# Load model and processor
processor = AutoImageProcessor.from_pretrained("wendys-llc/checkbox-classifier-efficientnet")
model = AutoModelForImageClassification.from_pretrained("wendys-llc/checkbox-classifier-efficientnet")

# Prepare image
image = Image.open("checkbox.jpg")
inputs = processor(images=image, return_tensors="pt")

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    
    # Get predicted class
    predicted_class_idx = logits.argmax(-1).item()
    predicted_label = model.config.id2label[predicted_class_idx]
    
    # Get confidence scores
    probabilities = torch.nn.functional.softmax(logits, dim=-1)
    confidence = probabilities.max().item()

print(f"Prediction: {predicted_label} (confidence: {confidence:.2%})")
```

### Batch Processing

```python
from transformers import pipeline
from PIL import Image

classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet")

# Process multiple images
images = [Image.open(f"checkbox_{i}.jpg") for i in range(1, 4)]
results = classifier(images)

for i, result in enumerate(results):
    top_pred = result[0]  # Get top prediction
    print(f"Image {i+1}: {top_pred['label']} ({top_pred['score']:.2%})")
```

## Model Details

### Architecture
- **Base Model**: google/efficientnet-b0
- **Model Type**: EfficientNet for Image Classification
- **Number of Labels**: 2 (checked, unchecked)
- **Input Size**: 224x224 RGB images
- **Framework**: PyTorch via Transformers

### Training Details
- **Dataset**: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx)
  - ~4,800 training samples
  - ~1,200 validation samples
- **Training Configuration**:
  - Epochs: 15 (with early stopping)
  - Batch Size: 64 (on A100)
  - Learning Rate: Default AdamW
  - Mixed Precision: FP16
  - Hardware: NVIDIA A100 GPU

## Acknowledgments

- Base model: [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0)
- Dataset: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx)
- Framework: [HuggingFace Transformers](https://github.com/huggingface/transformers)

## License

This model is licensed under the Apache 2.0 License. See the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.