Nelis5174473
/

GovLLM-7B-ultra

Text Generation

question-answering

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

GovLLM-7B-ultra / README.md

Nelis5174473's picture

Update README.md

41a6c31 verified 11 months ago

|

history blame contribute delete

2.83 kB

	---
	library_name: transformers
	tags:
	- government
	- conversational
	- question-answering
	- dutch
	- geitje
	license: apache-2.0
	datasets:
	- Nelis5174473/Dutch-QA-Pairs-Rijksoverheid
	language:
	- nl
	pipeline_tag: text-generation
	---

	<p align="center" style="margin:0;padding:0">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e04544f59f66e0e072dc5c/b-OsZLNJtPHMwzbgwmGlV.png" alt="GovLLM Ultra banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
	</p>

	<div style="margin:auto; text-align:center">
	<h1 style="margin-bottom: 0">GovLLM-7B-ultra</h1>
	<em>A question answering model about the Dutch Government.</em>
	</div>

	## Model description

	This model is a fine-tuned version of the Dutch conversational model [BramVanroy/GEITje-7B-ULTRA](https://huggingface.co/BramVanroy/GEITje-7B-ultra) on a [Dutch question-answer pair dataset](https://huggingface.co/datasets/Nelis5174473/Dutch-QA-Pairs-Rijksoverheid) of the Dutch Government. This is a Dutch question/answer model ultimately based on Mistral and fine-tuned with SFT and LoRA. The training with 3 epochs took almost 2 hours and was run on an Nvidia A100 (40GB VRAM).

	# Usage with Inference Endpoints (Dedicated)

	```python
	import requests

	API_URL = "https://your-own-endpoint.us-east-1.aws.endpoints.huggingface.cloud"
	headers = {"Authorization": "Bearer hf_your_own_token"}

	def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

	output = query({
	"inputs": "Geeft de overheid subsidie aan bedrijven?"
	})

	# print generated answer
	print(output[0]['generated_text'])
	```

	## Training hyperparameters

	The following hyperparameters were used during training:
	- block_size: 1024,
	- model_max_length: 2048,
	- padding: right,
	- mixed_precision: fp16,
	- learning rate (lr): 0.00003,
	- epochs: 3,
	- batch_size: 2,
	- optimizer: adamw_torch,
	- schedular: linear,
	- quantization: int8,
	- peft: true,
	- lora_r: 16,
	- lora_alpha: 16,
	- lora_dropout: 0.05

	### Training results

	\| Epoch \| Loss \| Grad_norm \| learning_rate \| step \|
	\|:------:\|---------:\|:----------:\|:-------------:\|:--------:\|
	\| 0.14 \| 1.3183 \| 0.6038 \| 1.3888e-05 \| 25/540 \|
	\| 0.42 \| 1.0220 \| 0.4180 \| 2.8765e-05 \| 75/540 \|
	\| 0.69 \| 0.9251 \| 0.4119 \| 2.56793-05 \| 125/540 \|
	\| 0.97 \| 0.9260 \| 0.4682 \| 2.2592e-05 \| 175/540 \|
	\| 1.25 \| 0.8586 \| 0.5338 \| 1.9506e-05 \| 225/540 \|
	\| 1.53 \| 0.8767 \| 0.6359 \| 1.6420e-05 \| 275/540 \|
	\| 1.80 \| 0.8721 \| 0.6137 \| 1.3333e-05 \| 325/540 \|
	\| 2.08 \| 0.8469 \| 0.7310 \| 1.0247e-05 \| 375/540 \|
	\| 2.36 \| 0.8324 \| 0.7945 \| 7.1605e-05 \| 425/540 \|
	\| 2.64 \| 0.8170 \| 0.8522 \| 4.0741e-05 \| 475/540 \|
	\| 2.91 \| 0.8185 \| 0.8562 \| 9.8765e-05 \| 525/540 \|