checkbox-classifier / README.md
wendys-llc's picture
Upload README.md with huggingface_hub
af65138 verified
|
raw
history blame
3.02 kB
metadata
license: apache-2.0
tags:
  - image-classification
  - checkbox-detection
  - computer-vision
  - pytorch
datasets:
  - wendys-llc/chkbx
metrics:
  - accuracy
library_name: pytorch

Checkbox State Classifier

This model classifies whether a checkbox is checked or unchecked.

Model Details

  • Architecture: EfficientNetV2-S (PyTorch)
  • Input Size: 128x128 RGB images
  • Output: Binary classification (unchecked: 0, checked: 1)
  • Validation Accuracy: 97.1%
  • Training: Mixed precision on A100 GPU

Quick Start

import torch
from PIL import Image
from torchvision import transforms
from huggingface_hub import hf_hub_download
import torch.nn as nn
from torchvision.models import efficientnet_v2_s, 
EfficientNet_V2_S_Weights

# Define model architecture
class EfficientNetV2Classifier(nn.Module):
    def __init__(self, num_classes=2, dropout_rate=0.3):
        super().__init__()
        self.backbone = efficientnet_v2_s(weights=EfficientNet
_V2_S_Weights.IMAGENET1K_V1)
        num_features = self.backbone.classifier[1].in_features
        self.backbone.classifier = nn.Sequential(
            nn.Dropout(dropout_rate),
            nn.Linear(num_features, 512),
            nn.SiLU(inplace=True),
            nn.BatchNorm1d(512),
            nn.Dropout(dropout_rate),
            nn.Linear(512, 256),
            nn.SiLU(inplace=True),
            nn.BatchNorm1d(256),
            nn.Dropout(dropout_rate/2),
            nn.Linear(256, num_classes)
        )
    
    def forward(self, x):
        return self.backbone(x)

# Download and load model
model_path = hf_hub_download(repo_id="wendys-llc/checkbox-classifier", 
filename="checkbox_classifier.pth")
checkpoint = torch.load(model_path, map_location='cpu')

model = EfficientNetV2Classifier(num_classes=2)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Image preprocessing
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
std=[0.229, 0.224, 0.225])
])

# Predict
def predict(image_path):
    image = Image.open(image_path).convert('RGB')
    input_tensor = transform(image).unsqueeze(0)
    
    with torch.no_grad():
        output = model(input_tensor)
        probabilities = torch.nn.functional.softmax(output, 
dim=1)
        predicted = torch.argmax(probabilities, dim=1).item()
        confidence = probabilities[0][predicted].item()
    
    labels = {0: "unchecked", 1: "checked"}
    return labels[predicted], confidence

# Example usage
result, conf = predict("checkbox.jpg")
print(f"Result: {result} (confidence: {conf:.1%})")

Training Dataset

Trained on https://huggingface.co/datasets/wendys-llc/chkbx
dataset containing ~6,000 annotated checkbox images.

Limitations

- Trained specifically on UI checkboxes, may not work well on
hand-drawn checkmarks
- Best performance on clear, high-contrast checkbox images
- Input images are resized to 128x128, very small checkboxes
may lose detail