metadata
			library_name: transformers
tags:
  - document-question-answering
  - layoutlmv3
  - ocr
  - document-understanding
  - paddleocr
  - multilingual
  - layout-aware
  - lakshya-singh
license: apache-2.0
language:
  - en
base_model:
  - microsoft/layoutlmv3-base
datasets:
  - nielsr/docvqa_1200_examples
Document QA Model
This is a fine-tuned document question-answering model based on layoutlmv3-base. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.
Model Details
Model Description
- Model Name: 
document-qa-model - Base Model: 
microsoft/layoutlmv3-base - Fine-tuned by: Lakshya Singh (solo contributor)
 - Languages: English, Spanish, Chinese
 - License: Apache-2.0 (inherited from base model)
 - Intended Use: Extract answers to structured queries from scanned documents
 - Not funded — this project was completed independently.
 
Model Sources
- Repository: [https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model.git]
 - Trained on: Adapted version of 
nielsr/docvqa_1200_examples - Model metrics: See 

 
Uses
Direct Use
This model can be used for:
- Question Answering on document images (PDFs, invoices, utility bills)
 - Information extraction tasks using OCR and layout-aware understanding
 
Out-of-Scope Use
- Not suitable for conversational QA
 - Not suitable for images with no OCR-processed text
 
Training Details
Dataset
The dataset consisted of:
- Images of utility bills and documents
 - OCR data with bounding boxes (from PaddleOCR)
 - Queries in English, Spanish, and Chinese
 - Answer spans with match scores and positions
 
Training Procedure
- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
 - Model: LayoutLMv3-base
 - Epochs: 4
 - Learning rate schedule: Shown in image below
 
Training Metrics
Evaluation
Metrics Used
- F1 score
 - Match score of predicted spans
 - Token overlap vs ground truth
 
Summary
The model performs well on document-style QA tasks, especially with:
- Clearly structured OCR results
 - Document types similar to utility bills, invoices, and forms
 
How to Use
- Available on my Github