Add link to Github repository and paper page
Browse filesThis PR improves the model card by adding a link to the Github repository and paper page.
README.md
CHANGED
|
@@ -1,17 +1,15 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: other
|
| 4 |
license_name: nvidia-open-model-license
|
| 5 |
-
license_link:
|
| 6 |
-
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
|
| 7 |
-
|
| 8 |
pipeline_tag: text-generation
|
| 9 |
-
language:
|
| 10 |
-
- en
|
| 11 |
tags:
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
---
|
| 16 |
|
| 17 |
# Llama-3.1-Nemotron-Ultra-253B-v1
|
|
@@ -28,18 +26,18 @@ The model underwent a multi-phase post-training process to enhance both its reas
|
|
| 28 |
|
| 29 |
This model is ready for commercial use.
|
| 30 |
|
| 31 |
-
For more details on how the model was trained, please see our [technical report](https://
|
| 32 |
|
| 33 |

|
| 34 |
|
| 35 |
This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
|
| 36 |
|
| 37 |
- [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
|
| 38 |
-
- [Llama-3.3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-
|
| 39 |
|
| 40 |
## License/Terms of Use
|
| 41 |
|
| 42 |
-
GOVERNING TERMS: Your use of this model is governed by the [NVIDIA Open Model License.](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/
|
| 43 |
|
| 44 |
**Model Developer:** NVIDIA
|
| 45 |
|
|
@@ -55,15 +53,17 @@ Developers designing AI Agent systems, chatbots, RAG systems, and other AI-power
|
|
| 55 |
|
| 56 |
## References
|
| 57 |
|
| 58 |
-
* [
|
| 59 |
-
* [
|
| 60 |
-
* [
|
| 61 |
-
* [
|
| 62 |
|
| 63 |
## Model Architecture
|
| 64 |
**Architecture Type:** Dense decoder-only Transformer model
|
| 65 |
**Network Architecture:** Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS)
|
| 66 |
|
|
|
|
|
|
|
| 67 |
**This model was developed based on Llama-3.1-405B-Instruct <br>
|
| 68 |
** This model has 253B model parameters. <br>
|
| 69 |
|
|
@@ -248,7 +248,13 @@ Data Labeling for Evaluation Datasets:
|
|
| 248 |
User Prompt Template:
|
| 249 |
|
| 250 |
```
|
| 251 |
-
"What is the correct answer to this question: {question}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 252 |
```
|
| 253 |
|
| 254 |
### AIME25
|
|
@@ -261,7 +267,8 @@ User Prompt Template:
|
|
| 261 |
User Prompt Template:
|
| 262 |
|
| 263 |
```
|
| 264 |
-
"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}
|
|
|
|
| 265 |
```
|
| 266 |
|
| 267 |
### BFCL V2 Live
|
|
@@ -339,7 +346,8 @@ You will use the following starter code to write the solution to the problem and
|
|
| 339 |
User Prompt Template:
|
| 340 |
|
| 341 |
```
|
| 342 |
-
"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}
|
|
|
|
| 343 |
```
|
| 344 |
|
| 345 |
### JudgeBench
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
library_name: transformers
|
| 5 |
license: other
|
| 6 |
license_name: nvidia-open-model-license
|
| 7 |
+
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
|
|
|
|
|
|
|
| 8 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
| 9 |
tags:
|
| 10 |
+
- nvidia
|
| 11 |
+
- llama-3
|
| 12 |
+
- pytorch
|
| 13 |
---
|
| 14 |
|
| 15 |
# Llama-3.1-Nemotron-Ultra-253B-v1
|
|
|
|
| 26 |
|
| 27 |
This model is ready for commercial use.
|
| 28 |
|
| 29 |
+
For more details on how the model was trained, please see our [technical report](https://huggingface.co/papers/2505.00949) and [blog](https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/).
|
| 30 |
|
| 31 |

|
| 32 |
|
| 33 |
This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
|
| 34 |
|
| 35 |
- [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
|
| 36 |
+
- [Llama-3.3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1)
|
| 37 |
|
| 38 |
## License/Terms of Use
|
| 39 |
|
| 40 |
+
GOVERNING TERMS: Your use of this model is governed by the [NVIDIA Open Model License.](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
|
| 41 |
|
| 42 |
**Model Developer:** NVIDIA
|
| 43 |
|
|
|
|
| 53 |
|
| 54 |
## References
|
| 55 |
|
| 56 |
+
* [[2505.00949] Llama-Nemotron: Efficient Reasoning Models](https://huggingface.co/papers/2505.00949)
|
| 57 |
+
* [[2502.00203] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment](https://arxiv.org/abs/2502.00203)
|
| 58 |
+
* [[2411.19146] Puzzle: Distillation-Based NAS for Inference-Optimized LLMs](https://arxiv.org/abs/2411.19146)
|
| 59 |
+
* [[2503.18908]FFN Fusion: Rethinking Sequential Computation in Large Language Models](https://arxiv.org/abs/2503.18908)
|
| 60 |
|
| 61 |
## Model Architecture
|
| 62 |
**Architecture Type:** Dense decoder-only Transformer model
|
| 63 |
**Network Architecture:** Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS)
|
| 64 |
|
| 65 |
+
**Github:** https://github.com/NVIDIA/NeMo
|
| 66 |
+
|
| 67 |
**This model was developed based on Llama-3.1-405B-Instruct <br>
|
| 68 |
** This model has 253B model parameters. <br>
|
| 69 |
|
|
|
|
| 248 |
User Prompt Template:
|
| 249 |
|
| 250 |
```
|
| 251 |
+
"What is the correct answer to this question: {question}
|
| 252 |
+
Choices:
|
| 253 |
+
A. {option_A}
|
| 254 |
+
B. {option_B}
|
| 255 |
+
C. {option_C}
|
| 256 |
+
D. {option_D}
|
| 257 |
+
Let's think step by step, and put the final answer (should be a single letter A, B, C, or D) into a \boxed{}"
|
| 258 |
```
|
| 259 |
|
| 260 |
### AIME25
|
|
|
|
| 267 |
User Prompt Template:
|
| 268 |
|
| 269 |
```
|
| 270 |
+
"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.
|
| 271 |
+
Question: {question}"
|
| 272 |
```
|
| 273 |
|
| 274 |
### BFCL V2 Live
|
|
|
|
| 346 |
User Prompt Template:
|
| 347 |
|
| 348 |
```
|
| 349 |
+
"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.
|
| 350 |
+
Question: {question}"
|
| 351 |
```
|
| 352 |
|
| 353 |
### JudgeBench
|