File size: 1,524 Bytes
3a0b741
 
 
 
 
 
 
 
 
 
 
 
 
987f8ac
 
cadfdc6
3a25617
 
 
 
cb1b618
60f685d
3a25617
 
 
 
 
 
 
 
60f685d
3a25617
 
60f685d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: cc-by-nc-nd-4.0
datasets:
- taln-ls2n/Adminset
language:
- fr
library_name: transformers
tags:
- camembert
- BERT
- Administrative documents
---

# AdminBERT 4GB: A Small French Language model adapted to Administrative documents

[AdminBERT-4GB](example) is a French language model adapted on a large corpus of 10 millions French administrative texts. It is a derivative of CamemBERT model, which is based on the RoBERTa architecture. AdminBERT-4GB is trained using the Whole Word Masking (WWM) objective with 30% mask rate for 2 epochs on 8 V100 GPUs. The dataset used for training is a sample of [Adminset](https://huggingface.co/datasets/taln-ls2n/Adminset).


## Evaluation

Regarding the fact that at date, there was no evaluation coprus available compose of French administrative documents, we decide to create our own on the NER (Named Entity Recognition) task.

### Model Performance

| Model                  | P (%)   | R (%)   | F1 (%)  |
|------------------------|---------|---------|---------|
| Wikineural-NER FT      | 77.49   | 75.40   | 75.70   |
| NERmemBERT-Large FT    | 77.43   | 78.38   | 77.13   |
| CamemBERT FT           | 77.62   | 79.59   | 77.26   |
| NERmemBERT-Base FT     | 77.99   | 79.59   | 78.34   |
| AdminBERT-NER 4G      | 78.47   | 80.35   | 79.26   |
| AdminBERT-NER 16GB     | 78.79   | 82.07   | 80.11   |

To evaluate each model, we performed five runs and averaged the results on the test set of [Adminset-NER](https://huggingface.co/datasets/taln-ls2n/Adminset-NER).