|
--- |
|
library_name: transformers |
|
tags: |
|
- government |
|
- conversational |
|
- question-answering |
|
- dutch |
|
- geitje |
|
license: apache-2.0 |
|
datasets: |
|
- Nelis5174473/Dutch-QA-Pairs-Rijksoverheid |
|
language: |
|
- nl |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
<p align="center" style="margin:0;padding:0"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e04544f59f66e0e072dc5c/b-OsZLNJtPHMwzbgwmGlV.png" alt="GovLLM Ultra banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/> |
|
</p> |
|
|
|
<div style="margin:auto; text-align:center"> |
|
<h1 style="margin-bottom: 0">GovLLM-7B-ultra</h1> |
|
<em>A question answering model about the Dutch Government.</em> |
|
</div> |
|
|
|
## Model description |
|
|
|
This model is a fine-tuned version of the Dutch conversational model [BramVanroy/GEITje-7B-ULTRA](https://huggingface.co/BramVanroy/GEITje-7B-ultra) on a [Dutch question-answer pair dataset](https://huggingface.co/datasets/Nelis5174473/Dutch-QA-Pairs-Rijksoverheid) of the Dutch Government. This is a Dutch question/answer model ultimately based on Mistral and fine-tuned with SFT and LoRA. The training with 3 epochs took almost 2 hours and was run on an Nvidia A100 (40GB VRAM). |
|
|
|
# Usage with Inference Endpoints (Dedicated) |
|
|
|
```python |
|
import requests |
|
|
|
API_URL = "https://your-own-endpoint.us-east-1.aws.endpoints.huggingface.cloud" |
|
headers = {"Authorization": "Bearer hf_your_own_token"} |
|
|
|
def query(payload): |
|
response = requests.post(API_URL, headers=headers, json=payload) |
|
return response.json() |
|
|
|
output = query({ |
|
"inputs": "Geeft de overheid subsidie aan bedrijven?" |
|
}) |
|
|
|
# print generated answer |
|
print(output[0]['generated_text']) |
|
``` |
|
|
|
## Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- block_size: 1024, |
|
- model_max_length: 2048, |
|
- padding: right, |
|
- mixed_precision: fp16, |
|
- learning rate (lr): 0.00003, |
|
- epochs: 3, |
|
- batch_size: 2, |
|
- optimizer: adamw_torch, |
|
- schedular: linear, |
|
- quantization: int8, |
|
- peft: true, |
|
- lora_r: 16, |
|
- lora_alpha: 16, |
|
- lora_dropout: 0.05 |
|
|
|
### Training results |
|
|
|
| Epoch | Loss | Grad_norm | learning_rate | step | |
|
|:------:|---------:|:----------:|:-------------:|:--------:| |
|
| 0.14 | 1.3183 | 0.6038 | 1.3888e-05 | 25/540 | |
|
| 0.42 | 1.0220 | 0.4180 | 2.8765e-05 | 75/540 | |
|
| 0.69 | 0.9251 | 0.4119 | 2.56793-05 | 125/540 | |
|
| 0.97 | 0.9260 | 0.4682 | 2.2592e-05 | 175/540 | |
|
| 1.25 | 0.8586 | 0.5338 | 1.9506e-05 | 225/540 | |
|
| 1.53 | 0.8767 | 0.6359 | 1.6420e-05 | 275/540 | |
|
| 1.80 | 0.8721 | 0.6137 | 1.3333e-05 | 325/540 | |
|
| 2.08 | 0.8469 | 0.7310 | 1.0247e-05 | 375/540 | |
|
| 2.36 | 0.8324 | 0.7945 | 7.1605e-05 | 425/540 | |
|
| 2.64 | 0.8170 | 0.8522 | 4.0741e-05 | 475/540 | |
|
| 2.91 | 0.8185 | 0.8562 | 9.8765e-05 | 525/540 | |