wendys-llc
/

checkbox-classifier

Image Classification

computer-vision

checkbox-detection

Model card Files Files and versions

checkbox-classifier / README.md

wendys-llc's picture

Upload README.md with huggingface_hub

af65138 verified 3 months ago

|

3.02 kB

	---
	license: apache-2.0
	tags:
	- image-classification
	- checkbox-detection
	- computer-vision
	- pytorch
	datasets:
	- wendys-llc/chkbx
	metrics:
	- accuracy
	library_name: pytorch
	---

	# Checkbox State Classifier

	This model classifies whether a checkbox is checked or
	unchecked.

	## Model Details
	- Architecture: EfficientNetV2-S (PyTorch)
	- Input Size: 128x128 RGB images
	- Output: Binary classification (unchecked: 0, checked: 1)
	- Validation Accuracy: 97.1%
	- Training: Mixed precision on A100 GPU

	## Quick Start

	```python
	import torch
	from PIL import Image
	from torchvision import transforms
	from huggingface_hub import hf_hub_download
	import torch.nn as nn
	from torchvision.models import efficientnet_v2_s,
	EfficientNet_V2_S_Weights

	# Define model architecture
	class EfficientNetV2Classifier(nn.Module):
	def __init__(self, num_classes=2, dropout_rate=0.3):
	super().__init__()
	self.backbone = efficientnet_v2_s(weights=EfficientNet
	_V2_S_Weights.IMAGENET1K_V1)
	num_features = self.backbone.classifier[1].in_features
	self.backbone.classifier = nn.Sequential(
	nn.Dropout(dropout_rate),
	nn.Linear(num_features, 512),
	nn.SiLU(inplace=True),
	nn.BatchNorm1d(512),
	nn.Dropout(dropout_rate),
	nn.Linear(512, 256),
	nn.SiLU(inplace=True),
	nn.BatchNorm1d(256),
	nn.Dropout(dropout_rate/2),
	nn.Linear(256, num_classes)
	)

	def forward(self, x):
	return self.backbone(x)

	# Download and load model
	model_path = hf_hub_download(repo_id="wendys-llc/checkbox-classifier",
	filename="checkbox_classifier.pth")
	checkpoint = torch.load(model_path, map_location='cpu')

	model = EfficientNetV2Classifier(num_classes=2)
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval()

	# Image preprocessing
	transform = transforms.Compose([
	transforms.Resize((128, 128)),
	transforms.ToTensor(),
	transforms.Normalize(mean=[0.485, 0.456, 0.406],
	std=[0.229, 0.224, 0.225])
	])

	# Predict
	def predict(image_path):
	image = Image.open(image_path).convert('RGB')
	input_tensor = transform(image).unsqueeze(0)

	with torch.no_grad():
	output = model(input_tensor)
	probabilities = torch.nn.functional.softmax(output,
	dim=1)
	predicted = torch.argmax(probabilities, dim=1).item()
	confidence = probabilities[0][predicted].item()

	labels = {0: "unchecked", 1: "checked"}
	return labels[predicted], confidence

	# Example usage
	result, conf = predict("checkbox.jpg")
	print(f"Result: {result} (confidence: {conf:.1%})")

	Training Dataset

	Trained on https://huggingface.co/datasets/wendys-llc/chkbx
	dataset containing ~6,000 annotated checkbox images.

	Limitations

	- Trained specifically on UI checkboxes, may not work well on
	hand-drawn checkmarks
	- Best performance on clear, high-contrast checkbox images
	- Input images are resized to 128x128, very small checkboxes
	may lose detail