TextNet-T/S/B: Efficient Text Detection Models

Overview

TextNet is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants TextNet-T, TextNet-S, and TextNet-B (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed.

Performance

TextNet achieves state-of-the-art results in text detection, outperforming hand-crafted models in both accuracy and speed. Its architecture is highly efficient, making it ideal for GPU-based applications.

How to use

Transformers

pip install transformers
import torch
import requests
from PIL import Image
from transformers import AutoImageProcessor, AutoBackbone

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained("jadechoghari/textnet-base")
model = AutoBackbone.from_pretrained("jadechoghari/textnet-base")

inputs = processor(image, return_tensors="pt")
with torch.no_grad():
  outputs = model(**inputs)

Training

We first compare TextNet with representative hand-crafted backbones, such as ResNets and VGG16. For a fair comparison, all models are first pre-trained on IC17-MLT [52] and then finetuned on Total-Text. The proposed TextNet models achieve a better trade-off between accuracy and inference speed than previous hand-crafted models by a significant margin. In addition, notably, our TextNet-T, -S, and -B only have 6.8M, 8.0M, and 8.9M parameters respectively, which are more parameter-efficient than ResNets and VGG16. These results demonstrate that TextNet models are effective for text detection on the GPU device.

Applications

Perfect for real-world text detection tasks, including:

  • Natural scene text recognition
  • Multi-lingual and multi-oriented text detection
  • Document text region analysis

Contribution

This model was contributed by Raghavan, jadechoghari and nielsr.

Downloads last month
23
Safetensors
Model size
13.6M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support