avemio-digital commited on
Commit
05007e3
·
verified ·
1 Parent(s): 3908f93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -21
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: apache-2.0
3
  datasets:
4
- - avemio/German_RAG-CPT-HESSIAN-AI
5
  language:
6
  - en
7
  - de
@@ -18,25 +18,25 @@ tags:
18
  ---
19
 
20
 
21
- <img src="https://www.German_RAG.ai/wp-content/uploads/2024/12/German_RAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="German_RAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
22
 
23
 
24
- # German_RAG-Mistral-Nemo-Base-2407-CPT-HESSIAN-AI
25
 
26
  <!-- Provide a quick summary of what the model is/does. -->
27
 
28
- **German_RAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
29
 
30
- Our German_RAG-MISTRAL-NEMO-CPT model are trained on this **[German_RAG-CPT](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) dataset.**
31
 
32
  ## Model Details
33
 
34
  The core models released in this batch are the following:
35
  | Size | Training Tokens |
36
  |------|--------|
37
- | [German_RAG-MISTRAL-NEMO-CPT](https://huggingface.co/avemio/German_RAG-NEMO-12B-CPT-HESSIAN-AI) | 507.47 million |
38
- | [German_RAG-MISTRAL-NEMO-SFT](https://huggingface.co/avemio/German_RAG-NEMO-12B-SFT-HESSIAN-AI) | 2.03 billion |
39
- | [German_RAG-MISTRAL-NEMO-ORPO](https://huggingface.co/avemio/German_RAG-NEMO-12B-ORPO-HESSIAN-AI) | 2.0577 billion |
40
  ### Model Description
41
 
42
  <!-- Provide a longer summary of what this model is. -->
@@ -46,19 +46,19 @@ The core models released in this batch are the following:
46
  - **Model type:** a Transformer style autoregressive language model.
47
  - **Language(s) (NLP):** German, English
48
  - **License:** The code and model are released under Apache 2.0.
49
- - **Contact:** [German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)
50
 
51
 
52
  ### Model Sources
53
 
54
  <!-- Provide the basic links for the model. -->
55
 
56
- - **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/German_RAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
57
  - **Repositories:**
58
  - Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
59
  - Evaluation code:
60
- - [German_RAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-HARD-BENCHMARK.git)
61
- - [German_RAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-EASY-BENCHMARK.git)
62
  - **Technical blog post:**
63
  <!-- - **Press release:** TODO -->
64
 
@@ -72,7 +72,7 @@ Now, proceed as usual with HuggingFace:
72
  ```python
73
  from transformers import AutoModelForCausalLM, AutoTokenizer
74
 
75
- model_name = "avemio/German_RAG-NEMO-12B-CPT-HESSIAN-AI"
76
 
77
  tokenizer = AutoTokenizer.from_pretrained(model_name)
78
 
@@ -92,7 +92,7 @@ We are providing a comprehensive Google Colab notebook to guide users through th
92
  ## Model Details
93
 
94
  ### Data
95
- For training data details, please see the [German_RAG-CPT-Dataset](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) documentation.
96
 
97
  #### Description
98
  CPT – Continued Pre-Training
@@ -107,7 +107,7 @@ The summarization task teaches models to distill complex information into clear,
107
  ### Architecture
108
 
109
 
110
- | Parameter | German_RAG-MISTRA-NEMO-CPT |
111
  |-----------------------|-----------------------------------------------------------------------------------------------|
112
  | **d_model** | 5120 |
113
  | **num heads** | 32 |
@@ -125,7 +125,7 @@ The summarization task teaches models to distill complex information into clear,
125
  ### Hyperparameters
126
 
127
 
128
- | Parameter | German_RAG-MISTRAL-NEMO-CPT |
129
  |---------------------------|--------------------|
130
  | **warmup steps** | 50 |
131
  | **peak LR** | 5.0E-07 |
@@ -136,19 +136,19 @@ The summarization task teaches models to distill complex information into clear,
136
 
137
  ## Environmental Impact
138
 
139
- German_RAG-MISTRAL-NEMO-CPT, running on NVIDIA A100 with 40 GPUs for 5 days, has an approximate power consumption as follows:
140
 
141
  It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
142
 
143
  | Model | GPU Type | Power Consumption From GPUs |
144
  |----------------|---------------------|-----------------------------|
145
- | German_RAG-MISTRAL-NEMO-CPT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.0144 MWh |
146
  ## Bias, Risks, and Limitations
147
 
148
  Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
149
  Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
150
 
151
- Otherwise, many facts from German_RAG-MISTRAL-NEMO-CPT or any LLM will often not be true, so they should be checked.
152
 
153
 
154
 
@@ -156,9 +156,9 @@ Otherwise, many facts from German_RAG-MISTRAL-NEMO-CPT or any LLM will often not
156
  ## Model Card Contact
157
 
158
 
159
- For errors in this model card, please contact ([German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)).
160
 
161
- ## The German_RAG AI Team
162
  [Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
163
  [Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
164
  [Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)
 
1
  ---
2
  license: apache-2.0
3
  datasets:
4
+ - avemio/German-RAG-CPT-HESSIAN-AI
5
  language:
6
  - en
7
  - de
 
18
  ---
19
 
20
 
21
+ <img src="https://www.German-RAG.ai/wp-content/uploads/2024/12/German-RAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="German-RAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
22
 
23
 
24
+ # German-RAG-Mistral-Nemo-Base-2407-CPT-HESSIAN-AI
25
 
26
  <!-- Provide a quick summary of what the model is/does. -->
27
 
28
+ **German-RAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
29
 
30
+ Our German-RAG-MISTRAL-NEMO-CPT model are trained on this **[German-RAG-CPT](https://huggingface.co/datasets/avemio/German-RAG-CPT-HESSIAN-AI) dataset.**
31
 
32
  ## Model Details
33
 
34
  The core models released in this batch are the following:
35
  | Size | Training Tokens |
36
  |------|--------|
37
+ | [German-RAG-MISTRAL-NEMO-CPT](https://huggingface.co/avemio/German-RAG-NEMO-12B-CPT-HESSIAN-AI) | 507.47 million |
38
+ | [German-RAG-MISTRAL-NEMO-SFT](https://huggingface.co/avemio/German-RAG-NEMO-12B-SFT-HESSIAN-AI) | 2.03 billion |
39
+ | [German-RAG-MISTRAL-NEMO-ORPO](https://huggingface.co/avemio/German-RAG-NEMO-12B-ORPO-HESSIAN-AI) | 2.0577 billion |
40
  ### Model Description
41
 
42
  <!-- Provide a longer summary of what this model is. -->
 
46
  - **Model type:** a Transformer style autoregressive language model.
47
  - **Language(s) (NLP):** German, English
48
  - **License:** The code and model are released under Apache 2.0.
49
+ - **Contact:** [German-RAG@avemio.digital](mailto:German-RAG@avemio.digital)
50
 
51
 
52
  ### Model Sources
53
 
54
  <!-- Provide the basic links for the model. -->
55
 
56
+ - **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/German-RAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
57
  - **Repositories:**
58
  - Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
59
  - Evaluation code:
60
+ - [German-RAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/German-RAG-LLM-HARD-BENCHMARK.git)
61
+ - [German-RAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/German-RAG-LLM-EASY-BENCHMARK.git)
62
  - **Technical blog post:**
63
  <!-- - **Press release:** TODO -->
64
 
 
72
  ```python
73
  from transformers import AutoModelForCausalLM, AutoTokenizer
74
 
75
+ model_name = "avemio/German-RAG-NEMO-12B-CPT-HESSIAN-AI"
76
 
77
  tokenizer = AutoTokenizer.from_pretrained(model_name)
78
 
 
92
  ## Model Details
93
 
94
  ### Data
95
+ For training data details, please see the [German-RAG-CPT-Dataset](https://huggingface.co/datasets/avemio/German-RAG-CPT-HESSIAN-AI) documentation.
96
 
97
  #### Description
98
  CPT – Continued Pre-Training
 
107
  ### Architecture
108
 
109
 
110
+ | Parameter | German-RAG-MISTRA-NEMO-CPT |
111
  |-----------------------|-----------------------------------------------------------------------------------------------|
112
  | **d_model** | 5120 |
113
  | **num heads** | 32 |
 
125
  ### Hyperparameters
126
 
127
 
128
+ | Parameter | German-RAG-MISTRAL-NEMO-CPT |
129
  |---------------------------|--------------------|
130
  | **warmup steps** | 50 |
131
  | **peak LR** | 5.0E-07 |
 
136
 
137
  ## Environmental Impact
138
 
139
+ German-RAG-MISTRAL-NEMO-CPT, running on NVIDIA A100 with 40 GPUs for 5 days, has an approximate power consumption as follows:
140
 
141
  It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
142
 
143
  | Model | GPU Type | Power Consumption From GPUs |
144
  |----------------|---------------------|-----------------------------|
145
+ | German-RAG-MISTRAL-NEMO-CPT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.0144 MWh |
146
  ## Bias, Risks, and Limitations
147
 
148
  Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
149
  Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
150
 
151
+ Otherwise, many facts from German-RAG-MISTRAL-NEMO-CPT or any LLM will often not be true, so they should be checked.
152
 
153
 
154
 
 
156
  ## Model Card Contact
157
 
158
 
159
+ For errors in this model card, please contact ([German-RAG@avemio.digital](mailto:German-RAG@avemio.digital)).
160
 
161
+ ## The German-RAG AI Team
162
  [Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
163
  [Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
164
  [Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)