Pclanglais commited on
Commit
c7e0443
·
verified ·
1 Parent(s): 5b8d9a9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - fr
6
+ - de
7
+ ---
8
+ **OCRerrcr** is a small language model specialized for the detection of OCR error.
9
+
10
+ OCRerrcr was trained by Elliot Jones for PleIAs on a sample of 1000 documents with labelled OCR errors from open data documents (Finance Commons) and cultural heritage sources (Common Corpus).
11
+
12
+ To date, OCRerrcr provide the most accurate agnostic OCR error rate estimate. PleIAs has also develop an alternative pipeline for this tasks, [OCRoscope](https://github.com/Pleias/OCRoscope), that scale significantly better but also significantly less accurate, especially for document with fewer mistakes.
13
+
14
+ The name OCRerrcr (instead of OCRerror) is a playful allusion to a common OCR misreading.
15
+
16
+ ## Example