anuashok/ocr-captcha-v3

This model is a fine-tuned version of microsoft/trocr-base-printed on Captchas of the type shown below

image/png

image/png

Training Summary

  • CER (Character Error Rate): 0.01394585726004922
  • Hyperparameters:
    • Learning Rate: 1.5078922700531405e-05
    • Batch Size: 16
    • Num Epochs: 7
    • Warmup Ratio: 0.14813004670666596
    • Weight Decay: 0.017176551931326833
    • Num Beams: 2
    • Length Penalty: 1.3612823161368288

Usage

from transformers import VisionEncoderDecoderModel, TrOCRProcessor
import torch
from PIL import Image

# Load model and processor
processor = TrOCRProcessor.from_pretrained("anuashok/ocr-captcha-v3")
model = VisionEncoderDecoderModel.from_pretrained("anuashok/ocr-captcha-v3")

# Load image
image = Image.open('path_to_your_image.jpg').convert("RGB")
# Load and preprocess image for display
image = Image.open(image_path).convert("RGBA")
# Create white background
background = Image.new("RGBA", image.size, (255, 255, 255))
combined = Image.alpha_composite(background, image).convert("RGB")

# Prepare image
pixel_values = processor(combined, return_tensors="pt").pixel_values

# Generate text
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
Downloads last month
7,799
Safetensors
Model size
334M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for anuashok/ocr-captcha-v3

Finetuned
(15)
this model
Finetunes
1 model

Spaces using anuashok/ocr-captcha-v3 2