gradientai
/

Llama-3-8B-Instruct-Gradient-4194k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tpeng726 commited on May 8, 2024

Commit

b156a3e

·

verified ·

1 Parent(s): d5aa9c0

Update title consistency

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ license: llama3
 ---
 <a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
-# Llama-3 8B Gradient Instruct 4194K (Work-in-progress)
 Join our custom agent and long context (262k-1M+) waitlist: https://forms.gle/L6TDY7dozx8TuoUv7
@@ -44,7 +44,7 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
 |------------------------|-----------|-----------|-----------|-----------|-----------|
 | Initialize From        | LLaMA-3 8B| 65K       | 262K      | 524k      | 1048k     |
 | Sequence Length 2^N    | 16        | 18        | 19        | 20        | 22        |
-| RoPE theta             | 15.3 M    | 207.1 M   | 1.06B     | 2.80B     | 45.2B     |
 | Batch Size             | 1         | 1         | 16         | 8         | 2         |
 | Gradient Accumulation Steps | 32    | 16        | 1         | 1         | 2         |
 | Steps                  | 30        | 24        | 50        | 50        | 12 (stopped early)       |

 ---
 <a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
+# Llama-3 8B Instruct Gradient 4194K (Work-in-progress)
 Join our custom agent and long context (262k-1M+) waitlist: https://forms.gle/L6TDY7dozx8TuoUv7
 |------------------------|-----------|-----------|-----------|-----------|-----------|
 | Initialize From        | LLaMA-3 8B| 65K       | 262K      | 524k      | 1048k     |
 | Sequence Length 2^N    | 16        | 18        | 19        | 20        | 22        |
+| RoPE Theta             | 15.3 M    | 207.1 M   | 1.06B     | 2.80B     | 45.2B     |
 | Batch Size             | 1         | 1         | 16         | 8         | 2         |
 | Gradient Accumulation Steps | 32    | 16        | 1         | 1         | 2         |
 | Steps                  | 30        | 24        | 50        | 50        | 12 (stopped early)       |