AI Bookkeeper: Enhancing Accounting Document Understanding Through Supervised Fine-Tuning

Community Article Published February 6, 2025

Upvote

tosi-n Tosin Dairo

jenesys-ai

hikarimaru Manuel Giambi

jenesys-ai

Abstract

1. Introduction

2. Dataset
2.1 Data Collection

2.2 Annotation Process

2.3 Prompting Strategy

3. Methodology
3.1 Low-Rank Adaptation (LoRA)

3.2 Supervised Fine-tuning (SFT)

4. Results
4.1 Model Performance Analysis

4.2 Comparative Analysis

5. Discussion
5.1 Technical Advancements

5.2 Current Limitations

6. Conclusion

Abstract

We present Ark series, a group of large vision language model (LVLM) fine-tuned specifically for accounting and financial document understanding. Through extensive experimentation, we demonstrate significant improvements in document understanding, data extraction, and document intelligence strictly for bookkeeping tasks. Our model achieves state-of-the-art performance across multiple bookkeeping-specific benchmarks while maintaining high accuracy. This is the first step in the right direction in building out an AI Bookkeeper as we start to delegate the manual data entry and ledger coding tasks before fully transitioning to other end-to-end operational and administrative tasks like document chase down and retrieval, completing bulk editing and bulk publishing function to reduce friction and manual intervention.

1. Introduction

Automating bookkeeping processes requires a robust system capable of not just understanding and processing financial documents with high accuracy, but being able to reduce the need for manual intervention in the bookkeeping workflow. This publication focuses on Ark’s training on manual data entry and ledger coding tasks. While existing LVLMs show promise in general document understanding, the specialised nature of accounting documents presents unique challenges requiring domain-specific optimisation.

2. Dataset

2.1 Data Collection

Our dataset comprises two main sources:

Historical records from expert accountant annotations
New annotations from in-house and outsourced accounting experts

Dataset includes:

Invoice processing samples
Receipt analysis samples
personalised ledger data

2.2 Annotation Process

Expert annotations were collected through:

AI annotation (40%)
In-house accounting and bookkeeping experts (40%)
Outsourced accounting expert (20%)

2.3 Prompting Strategy

We employ Chain of Thought (CoT) and Tree of Thought (ToT) prompting methods to guide structured extraction, categorisation, and decision-making processes:

Chain of Thought (CoT): Used for tasks requiring sequential reasoning, including VAT calculations, arithmetic checks, and accounting document type classification.
Tree of Thought (ToT): Used for ambiguous or hierarchical tasks such as line item extraction from nested tables, resolving supplier discrepancies, and multi-page single document context.

[Figure 1: Document Processing Flow (CoT + ToT)]

3. Methodology

[Figure 2: Training Pipeline Diagram]

3.1 Low-Rank Adaptation (LoRA)

Configuration:

Backbone (Vision Encoder): R-16
LLM: R-16

Parameter efficiency:

trainable params: 6,291,456 || trainable%: 2.0275153367147256

3.2 Supervised Fine-tuning (SFT)

Training parameters:

Model Name	Param Size	Learning Rate	Batch Size	Gradient Accumulation Step	Epoch	Warmup Ratio	Weight Decay
Ark I	8B	2e-5	1	4	5	0.03	0.05
Ark I	8B	5e-5	1	2	8	0.05	0.03
Ark I	8B	4.5e-5	1	2	6	0.04	0.04
Ark II	26B	1e-6	1	2	6	0.04	0.04
Ark II	26B	5e-5	1	2	6	0.05	0.03

[Figure 3: Loss Convergence Graph]

4. Results

[Figure 4: Model Performance by Category]

4.1 Model Performance Analysis

The Ark series demonstrates progressive improvements across key metrics as seen in our leaderboard:

Ark I (8B): Established baseline performance with 64.1% accuracy in accounting document classification
Ark II (26B): Achieved 71.8% accuracy with enhanced comprehension of complex accounting document structures

4.2 Comparative Analysis

Document Understanding: 15% improvement over GPT-4o in accounting document comprehension
Processing Speed: 2.5x faster document processing compared to human benchmarks

5. Discussion

5.1 Technical Advancements

The results demonstrate significant improvements in:

Accounting Document Processing:
- Enhanced transaction classification accuracy
- Adaptive document type recognition
Workflow Integration:
- Streamlined processing pipeline
- Multi-document context understanding

5.2 Current Limitations

Complex multi-page multi-document handling
Cross-reference validation

6. Conclusion

The Ark series demonstrates the effectiveness of SFT in specialised document understanding tasks. With Ark III on the horizon, future work will focus on:

Advanced RL Integration
Enhanced Workflow Automation

This technical report establishes a foundation for next-generation autonomous bookkeeping systems, with planned developments aimed at continuously exceeding current performance benchmarks through sophisticated RL techniques and workflow automation.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote