Model Card for Model Futurice/gpt3-finnish-3B-instruct

The model gpt3-finnish-3B-instruct is an instruction fine-tuned model intended for RAG type Q&A in Finnish.

Model Details

Model Description

The gpt3-finnish-3B-instruct model is based on TurkuNLP Finnish GPT-3-models. They are a model family of pretrained monolingual GPT-style language models, based on BLOOM-architecture.

The model was fine-tuned using a sample of dataset TurkuNLP/squad_v2_fi, that was DeepL translated from SQuAD2.0.

  • Developed by: Martti Sutinen
  • Model type: Bloom
  • Language(s) (NLP): Finnish
  • License: Apache-2.0
  • Finetuned from model: TurkuNLP/gpt3-finnish-large

Uses

Intended for RAG type Q&A in Finnish.

Direct Use

Intended for text generation and RAG type Q&A in Finnish. Supply a context and ask a question about it.

Out-of-Scope Use

Please do not misuse the model. Not recommended for other use cases.

Bias, Risks, and Limitations

A key limitation is simple and limited selection of fine-tuning data. Please do not expect high quality answers.

Recommendations

Recommeded to continue fine-tuning with more data or newer architecture.

How to Get Started with the Model

  • Recommended system message: "Olet avustaja. Seuraavaksi saat kysymyksen tai tehtรคvรคn. Kirjoita vastaus parhaasi mukaan siten ettรค se tรคyttรครค kysymyksen tai tehtรคvรคn vaatimukset."
  • Recommended format for question about context: Tausta: "{context} \n\nKรคytรค vain taustaa ja vastaa kysymykseen tai tehtรคvรครคn: {question}"
  • Prompt format: tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

Where messages with typical format: messages = [ {"role": "system", "content": system_message}, {"role": "user", "content": prompt_with_context} ].

Here is what the input could look like:

<s><|im_start|>system
Olet avustaja. Seuraavaksi saat kysymyksen tai tehtรคvรคn. Kirjoita vastaus parhaasi mukaan siten ettรค se tรคyttรครค kysymyksen tai tehtรคvรคn vaatimukset.<|im_end|>
<|im_start|>user
Tausta:
Dokumentti luotiin tammikuussa. Sen kirjoittajaa ei tunneta.

Kรคytรค vain taustaa ja vastaa kysymykseen tai tehtรคvรครคn: Milloin dokumentti kirjoitettiin?<|im_end|>
<|im_start|>assistant

Use pipeline with task text-generation and the recommended format.

Training Details

Training Data

Trained with 20000 random samples from test data in: TurkuNLP/squad_v2_fi.

Training Procedure

Training was done for 4-bit base model with supervised fine-tuning and Lora.

Training Hyperparameters

  • Training regime: 4-bit, batch size 2, max steps 20000, data collator for completion only

Evaluation

Evaluation has not been done properly yet.

Testing Data, Factors & Metrics

Testing Data

Evaluated with 1000 random samples from test data in: TurkuNLP/squad_v2_fi.

Factors

Same factors as in SQuAD2.0.

Metrics

Loss.

Results

No results to be shared yet.

Summary

Environmental Impact

Environmental impact not yet evaluated.

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: Mostly trained on A100
  • Hours used: 5-10 hours
  • Cloud Provider: GCP
  • Compute Region: Unknown
  • Carbon Emitted: Not evaluated

Model Architecture and Objective

Bloom.

Compute Infrastructure

Colab.

Hardware

1 x A100.

Software

Typical software used.

Model Card Contact

Martti Sutinen

Downloads last month
6
Safetensors
Model size
3.19B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Futurice/gpt3-finnish-3B-instruct

Finetuned
(1)
this model

Dataset used to train Futurice/gpt3-finnish-3B-instruct