Safetensors
Polish
mistral
MinistryofDigitalAffairs commited on
Commit
8de2b5a
·
verified ·
1 Parent(s): a93c430

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
- license: cc-by-nd-4.0
 
 
3
  ---
4
  <p align="center">
5
  <img src="https://pllum.org.pl/_nuxt/PLLuM_logo_RGB_color.DXNEc-VR.png">
@@ -50,7 +52,7 @@ Below is a summary of the main PLLuM models, including their licenses, bases, an
50
 
51
  ### Model Development
52
  - **Pretraining**: All models were pretrained or continued-pretrained on large-scale Polish corpora (up to 150B tokens) plus a range of additional Slavic/Baltic and English texts.
53
- - **Instruction Fine-Tuning**: We refined the models on manually curated Polish “organic instructions,” converted instructions from premium Polish corpora, and synthetic instructions generated by strong LLMs.
54
  - **Alignment and Preference Learning**: Manually annotated preference data taught the models to produce safer, balanced, and contextually appropriate responses, even in adversarial or sensitive cases.
55
  - **Domain-Specific Adaptations**: Specialized RAG-based (Retrieval Augmented Generation) models were developed for tasks like public administration, demonstrating strong performance in complex information retrieval and question answering.
56
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - pl
5
  ---
6
  <p align="center">
7
  <img src="https://pllum.org.pl/_nuxt/PLLuM_logo_RGB_color.DXNEc-VR.png">
 
52
 
53
  ### Model Development
54
  - **Pretraining**: All models were pretrained or continued-pretrained on large-scale Polish corpora (up to 150B tokens) plus a range of additional Slavic/Baltic and English texts.
55
+ - **Instruction Fine-Tuning**: We refined the models on manually curated Polish “organic instructions (approx. 40k), converted instructions from premium Polish corpora (approx. 50k), and synthetic instructions generated by strong LLMs (approx. 10k).
56
  - **Alignment and Preference Learning**: Manually annotated preference data taught the models to produce safer, balanced, and contextually appropriate responses, even in adversarial or sensitive cases.
57
  - **Domain-Specific Adaptations**: Specialized RAG-based (Retrieval Augmented Generation) models were developed for tasks like public administration, demonstrating strong performance in complex information retrieval and question answering.
58