avemio-digital
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ tags:
|
|
22 |
<img src="https://www.grag.ai/wp-content/uploads/2024/12/GRAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="GRAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
23 |
|
24 |
|
25 |
-
# Model Card for GRAG-
|
26 |
|
27 |
<!-- Provide a quick summary of what the model is/does. -->
|
28 |
|
@@ -35,9 +35,9 @@ Our GRAG-LLAMA-SFT model are trained on this **[GRAG-SFT](https://huggingface.co
|
|
35 |
The core models released in this batch are the following:
|
36 |
| Size | Training Tokens |
|
37 |
|------|--------|
|
38 |
-
| [GRAG-
|
39 |
-
| [GRAG-
|
40 |
-
| [GRAG-
|
41 |
### Model Description
|
42 |
|
43 |
<!-- Provide a longer summary of what this model is. -->
|
@@ -71,7 +71,7 @@ Now, proceed as usual with HuggingFace:
|
|
71 |
```python
|
72 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
73 |
|
74 |
-
model_name = "avemio/GRAG-
|
75 |
|
76 |
model = AutoModelForCausalLM.from_pretrained(
|
77 |
model_name,
|
@@ -133,7 +133,7 @@ Four evaluation metrics were employed across all subsets: language quality, over
|
|
133 |
- **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
|
134 |
|
135 |
|
136 |
-
| Metric | [Vanila-
|
137 |
|------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
|
138 |
| **Average_language_quality** | 85.88 | 89.61 | 89.1 | | |
|
139 |
| **extraction_recall_weighted_overall_score** | 35.2 | 52.3 | 48.8 | | |
|
@@ -162,25 +162,25 @@ The implementation of these tasks within RAG systems can significantly improve o
|
|
162 |
### Architecture
|
163 |
|
164 |
|
165 |
-
| Parameter | GRAG-
|
166 |
|-----------------------|-----------------------------------------------------------------------------------------------|
|
167 |
-
| **d_model** |
|
168 |
| **num heads** | 32 |
|
169 |
-
| **num layers** |
|
170 |
-
| **MLP ratio** |
|
171 |
| **LayerNorm type** | RMSNorm |
|
172 |
| **pos embeddings** | RoPE |
|
173 |
| **attention variant**| Standard Multi-Head Self Attention |
|
174 |
| **biases** | none |
|
175 |
| **block type** | sequential |
|
176 |
| **activation** | SiLU |
|
177 |
-
| **sequence length** |
|
178 |
| **weight typing** | bfloat16
|
179 |
|
180 |
### Hyperparameters
|
181 |
|
182 |
|
183 |
-
| Parameter | GRAG-
|
184 |
|---------------------------|--------------------|
|
185 |
| **warmup steps** | 50 |
|
186 |
| **peak LR** | 5.0E-07 |
|
@@ -191,19 +191,19 @@ The implementation of these tasks within RAG systems can significantly improve o
|
|
191 |
|
192 |
## Environmental Impact
|
193 |
|
194 |
-
GRAG-
|
195 |
|
196 |
It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
|
197 |
|
198 |
| Model | GPU Type | Power Consumption From GPUs |
|
199 |
|----------------|---------------------|-----------------------------|
|
200 |
-
| GRAG-
|
201 |
## Bias, Risks, and Limitations
|
202 |
|
203 |
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
|
204 |
Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
|
205 |
|
206 |
-
Otherwise, many facts from GRAG-
|
207 |
|
208 |
|
209 |
|
|
|
22 |
<img src="https://www.grag.ai/wp-content/uploads/2024/12/GRAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="GRAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
23 |
|
24 |
|
25 |
+
# Model Card for GRAG-NEMO-12B-SFT-HESSIAN-AI
|
26 |
|
27 |
<!-- Provide a quick summary of what the model is/does. -->
|
28 |
|
|
|
35 |
The core models released in this batch are the following:
|
36 |
| Size | Training Tokens |
|
37 |
|------|--------|
|
38 |
+
| [GRAG-NEMO-CPT](https://huggingface.co/avemio/GRAG-NEMO-12B-CPT-HESSIAN-AI) | 507.47 million |
|
39 |
+
| [GRAG-NEMO-SFT](https://huggingface.co/avemio/GRAG-NEMO-12B-SFT-HESSIAN-AI) | 2.03 billion |
|
40 |
+
| [GRAG-NEMO-ORPO](https://huggingface.co/avemio/GRAG-NEMO-12B-ORPO-HESSIAN-AI) | 2.0577 billion |
|
41 |
### Model Description
|
42 |
|
43 |
<!-- Provide a longer summary of what this model is. -->
|
|
|
71 |
```python
|
72 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
73 |
|
74 |
+
model_name = "avemio/GRAG-NEMO-12B-SFT-HESSIAN-AI"
|
75 |
|
76 |
model = AutoModelForCausalLM.from_pretrained(
|
77 |
model_name,
|
|
|
133 |
- **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
|
134 |
|
135 |
|
136 |
+
| Metric | [Vanila-Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) | [GRAG-NEMO-SFT](https://huggingface.co/avemio/GRAG-NEMO-12B-SFT-HESSIAN-AI) | [GRAG-NEMO-ORPO](https://huggingface.co/avemio/GRAG-NEMO-12B-ORPO-HESSIAN-AI) | [GRAG-NEMO-MERGED]() | GPT-3.5-TURBO |
|
137 |
|------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
|
138 |
| **Average_language_quality** | 85.88 | 89.61 | 89.1 | | |
|
139 |
| **extraction_recall_weighted_overall_score** | 35.2 | 52.3 | 48.8 | | |
|
|
|
162 |
### Architecture
|
163 |
|
164 |
|
165 |
+
| Parameter | GRAG-NEMO-SFT |
|
166 |
|-----------------------|-----------------------------------------------------------------------------------------------|
|
167 |
+
| **d_model** | 5120 |
|
168 |
| **num heads** | 32 |
|
169 |
+
| **num layers** | 40 |
|
170 |
+
| **MLP ratio** | 2.8 |
|
171 |
| **LayerNorm type** | RMSNorm |
|
172 |
| **pos embeddings** | RoPE |
|
173 |
| **attention variant**| Standard Multi-Head Self Attention |
|
174 |
| **biases** | none |
|
175 |
| **block type** | sequential |
|
176 |
| **activation** | SiLU |
|
177 |
+
| **sequence length** | 1024000 |
|
178 |
| **weight typing** | bfloat16
|
179 |
|
180 |
### Hyperparameters
|
181 |
|
182 |
|
183 |
+
| Parameter | GRAG-NEMO-SFT |
|
184 |
|---------------------------|--------------------|
|
185 |
| **warmup steps** | 50 |
|
186 |
| **peak LR** | 5.0E-07 |
|
|
|
191 |
|
192 |
## Environmental Impact
|
193 |
|
194 |
+
GRAG-NEMO-SFT, running on NVIDIA A100 with 8 GPUs for 5 days, has an approximate power consumption as follows:
|
195 |
|
196 |
It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
|
197 |
|
198 |
| Model | GPU Type | Power Consumption From GPUs |
|
199 |
|----------------|---------------------|-----------------------------|
|
200 |
+
| GRAG-NEMO-SFT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.288 MWh |
|
201 |
## Bias, Risks, and Limitations
|
202 |
|
203 |
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
|
204 |
Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
|
205 |
|
206 |
+
Otherwise, many facts from GRAG-NEMO-SFT or any LLM will often not be true, so they should be checked.
|
207 |
|
208 |
|
209 |
|