Upload 7 files

Browse files

Files changed (8) hide show

.gitattributes +1 -0
README.md +89 -3
README_ZH_COT.md +87 -0
model.safetensors.index.json +778 -0
special_tokens_map.json +23 -0
tokenizer.json +3 -0
tokenizer_config.json +197 -0
trainer_state.json +1050 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,89 @@
----
-license: apache-2.0
----

+<p align="left">
+  <a href="https://huggingface.co/datasets/ZTE-AIM/LLM-Adaptive-ZMath-model-32B/README.md">English</a> |
+  <a href="https://huggingface.co/datasets/ZTE-AIM/LLM-Adaptive-ZMath-model-32B/README_ZH-COT.md">中文</a>
+</p>
+datasets:
+- ZTE-AIM/32B_LLM_AdaptiveMath_data
+- ZTE-AIM/32B_LLM_AdaptiveCode_data
+base_model:
+- DeepSeek-R1-Distill-Qwen-32B
+---
+## 32B_LLM_AdaptiveMath_data
+[\[🤗 HF Dataset\]](https://huggingface.co/datasets/ZTE-AIM/32B_LLM_AdaptiveMath_data)
+## LLM-Adaptive-CoT-Code-data
+[\[🤗 HF Dataset\]](https://huggingface.co/datasets/ZTE-AIM/32B_LLM_AdaptiveCode_data)
+## LLM-Adaptive-ZMath-model-32B
+[\[🤗 LLM-Adaptive-ZMath-model-32B\]](https://huggingface.co/ZTE-AIM/LLM-Adaptive-ZMath-model-32B)
+## LLM-Adaptive-ZCode-model-32B
+[\[🤗 LLM-Adaptive-ZCode-model-32B\]](https://huggingface.co/ZTE-AIM/LLM-Adaptive-ZCode-model-32B)
+## Model Overview
+This work presents a fine-tuned reasoning model built on the DeepSeek-Distill architecture through a novel LLM-Adaptive Question Difficulty Grading method. Unlike traditional CoT generation approaches, this model leverages the reasoning strength of DeepSeek-R1 (671B) to distill high-quality chain-of-thought (CoT) data. A core innovation lies in the dynamic construction of difficulty-aligned datasets based on the target LLM's own problem-solving capabilities.
+The proposed approach includes adaptive evaluation of question difficulty, followed by tailored sampling and response generation. This enables the model to efficiently learn from progressively challenging problems, thereby boosting reasoning performance across multiple domains such as mathematical problem solving and code generation.
+Fine-tuned variants like ZMath-32B and ZCode-32B exhibit superior performance to baseline models like DeepSeek-Distill-32B and phi-4, even with limited high-quality data. Notably, the ZMath-32B model trained on only 2K PRM-graded CoT samples surpassed its baseline across all math benchmarks, confirming the effectiveness of the adaptive CoT generation methodology.
+## Training Configuration
+Our training framework builds on previous advancements in s1-1k, LIMO, and Light-R1, implemented through the LLama-Factory to leverage its proven scalability. The framework incorporates the Deepseek-R1 template, flash-attention2 and Liger-Kernel to improve computational efficiency while minimizing memory requirements. All experiments are conducted on a 2×8 H800 GPU cluster, with performance evaluations executed using the Skythought benchmarking suite.
+The training configuration for grpo is as follows:
+```python
+Context Length: 16,384 tokens
+Learning Rate: 5e-6
+Batch Size: 128
+Epochs: 10
+```
+## Usage
+You can load the model using the Hugging Face `transformers` library:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Replace with the actual path to your model on Hugging Face.
+model_name = "your-org/ZMath-32B"
+# Load the tokenizer.
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+# Load the model (with multi‑GPU support and automatic allocation to available devices).
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16,         # Use float16 precision to save GPU memory
+    device_map="auto",                 # Automatically distribute the model across multiple GPUs.
+    trust_remote_code=True
+)
+# 示例推理
+prompt = "Solve the following math problem step by step: 12 * (3 + 4) = ?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=100)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Paper Link
+- [📄 Read the Paper (PDF)](https://arxiv.org/pdf/2504.11919)
+# Institution
+- ZTE-AIM
+## Model Contact
+- [email protected]
+- [email protected]

README_ZH_COT.md ADDED Viewed

	@@ -0,0 +1,87 @@

+<p align="left">
+  <a href="https://huggingface.co/datasets/ZTE-AIM/LLM-Adaptive-ZMath-model-32B/README.md">English</a> |
+  <a href="https://huggingface.co/datasets/ZTE-AIM/LLM-Adaptive-ZMath-model-32B/README_ZH-COT.md">中文</a>
+</p>
+datasets:
+- ZTE-AIM/32B_LLM_AdaptiveMath_data
+- ZTE-AIM/32B_LLM_AdaptiveCode_data
+base_model:
+- DeepSeek-R1-Distill-Qwen-32B
+---
+## 32B_LLM_AdaptiveMath_data
+[\[🤗 HF 数据集\]](https://huggingface.co/datasets/ZTE-AIM/32B_LLM_AdaptiveMath_data)
+## LLM-Adaptive-CoT-Code-data
+[\[🤗 HF 数据集\]](https://huggingface.co/datasets/ZTE-AIM/32B_LLM_AdaptiveCode_data)
+## LLM-Adaptive-ZMath-model-32B
+[\[🤗 LLM-Adaptive-ZMath-model-32B\]](https://huggingface.co/ZTE-AIM/LLM-Adaptive-ZMath-model-32B)
+## LLM-Adaptive-ZCode-model-32B
+[\[🤗 LLM-Adaptive-ZCode-model-32B\]](https://huggingface.co/ZTE-AIM/LLM-Adaptive-ZCode-model-32B)
+## 模型概述
+本工作通过一种新颖的 LLM 自适应题目难度分级方法，在 DeepSeek-Distill 架构基础上进行了微调，得到了一个高效的推理模型。不同于传统的 CoT（Chain-of-Thought）生成方法，该模型利用 DeepSeek-R1（671B）的推理能力来蒸馏出高质量的思维链数据。核心创新点在于根据目标 LLM 本身的解题能力，动态构建与之难度匹配的数据集。
+所提方法包括对题目难度的自适应评估，以及针对性采样与响应生成。这使模型能够从渐进递增难度的问题中高效学习，从而在数学题求解和代码生成等多个领域显著提升推理性能。
+经过微调的 ZMath-32B 和 ZCode-32B 变体，在可用的高质量数据量有限的情况下，性能仍优于 DeepSeek-Distill-32B 和 phi-4 等基准模型。值得注意的是，仅使用 2K 条 PRM 分级的 CoT 样本训练的 ZMath-32B 模型，就在所有数学基准上都超越了基线，充分验证了自适应 CoT 生成方法的有效性。
+## 训练配置
+我们的训练框架基于 s1-1k、LIMO 和 Light-R1 等前沿方案，通过 LLama-Factory 实现高可扩展性。框架中融合了 Deepseek-R1 模板、flash-attention2 和 Liger-Kernel，以提升计算效率并降低内存消耗。所有实验均在 2×8 H800 GPU 集群上进行，性能评估使用 Skythought 基准套件。
+本次 grpo 训练的关键配置如下：
+```python
+Context Length: 16,384 tokens
+Learning Rate: 5e-6
+Batch Size: 128
+Epochs: 10
+```
+## 使用方法
+你可以通过 Hugging Face 的 `transformers` 库加载该模型：
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# 将以下替换为你在 Hugging Face 上模型的实际路径。
+model_name = "your-org/ZMath-32B"
+# 加载分词器
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+# 加载模型（支持多 GPU 并自动分配设备）
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16,         # 使用 float16 精度以节省显存
+    device_map="auto",                 # 自动将模型分配到可用 GPU
+    trust_remote_code=True
+)
+# 示例推理
+prompt = "Solve the following math problem step by step: 12 * (3 + 4) = ?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=100)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## 论文链接
+- [📄 阅读论文 (PDF)](https://arxiv.org/pdf/2504.11919)
+## 机构
+- ZTE-AIM
+## 模型联系人
+- [email protected]
+- [email protected]

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,778 @@

+{
+  "metadata": {
+    "total_size": 65527752704
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00014-of-00014.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.13.self_attn.k_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.13.self_attn.q_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.13.self_attn.v_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.self_attn.k_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.self_attn.q_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.17.self_attn.v_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.18.self_attn.k_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.18.self_attn.q_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.18.self_attn.v_proj.bias": "model-00004-of-00014.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00014.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
+    "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.self_attn.k_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.self_attn.q_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.22.self_attn.v_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.23.self_attn.k_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.23.self_attn.q_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.23.self_attn.v_proj.bias": "model-00005-of-00014.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.24.self_attn.k_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.24.self_attn.q_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.24.self_attn.v_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.self_attn.k_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.self_attn.q_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.26.self_attn.v_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.self_attn.k_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.self_attn.q_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.27.self_attn.v_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.28.self_attn.k_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.28.self_attn.q_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.28.self_attn.v_proj.bias": "model-00006-of-00014.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.29.self_attn.k_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.29.self_attn.q_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.29.self_attn.v_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00014.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.30.self_attn.k_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.30.self_attn.q_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.30.self_attn.v_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.self_attn.k_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.self_attn.q_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.31.self_attn.v_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.input_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.self_attn.k_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.32.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.self_attn.q_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.32.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.32.self_attn.v_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.32.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.33.input_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.33.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.33.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.33.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.33.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.33.self_attn.k_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.33.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.33.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.33.self_attn.q_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.33.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.33.self_attn.v_proj.bias": "model-00007-of-00014.safetensors",
+    "model.layers.33.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
+    "model.layers.34.input_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.34.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.34.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.34.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.34.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.34.self_attn.k_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.34.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.34.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.34.self_attn.q_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.34.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.34.self_attn.v_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.34.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.input_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.self_attn.k_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.35.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.self_attn.q_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.35.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.35.self_attn.v_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.35.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.input_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.self_attn.k_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.36.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.self_attn.q_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.36.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.36.self_attn.v_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.36.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.input_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.self_attn.k_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.37.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.self_attn.q_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.37.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.37.self_attn.v_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.37.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.38.input_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.38.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.38.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.38.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.38.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.38.self_attn.k_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.38.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.38.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.38.self_attn.q_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.38.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.38.self_attn.v_proj.bias": "model-00008-of-00014.safetensors",
+    "model.layers.38.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
+    "model.layers.39.input_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.39.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.39.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.39.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.39.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.39.self_attn.k_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.39.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.39.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.39.self_attn.q_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.39.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.39.self_attn.v_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.39.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.40.input_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.40.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.40.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.40.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.40.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.40.self_attn.k_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.40.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.40.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.40.self_attn.q_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.40.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.40.self_attn.v_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.40.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.input_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.self_attn.k_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.41.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.self_attn.q_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.41.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.41.self_attn.v_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.41.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.input_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.self_attn.k_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.42.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.self_attn.q_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.42.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.42.self_attn.v_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.42.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.43.input_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.43.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.43.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.43.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.43.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.43.self_attn.k_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.43.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.43.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.43.self_attn.q_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.43.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.43.self_attn.v_proj.bias": "model-00009-of-00014.safetensors",
+    "model.layers.43.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
+    "model.layers.44.input_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.44.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.44.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.44.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.44.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.44.self_attn.k_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.44.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.44.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.44.self_attn.q_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.44.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.44.self_attn.v_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.44.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.input_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.self_attn.k_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.45.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.self_attn.q_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.45.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.45.self_attn.v_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.45.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.input_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.self_attn.k_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.46.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.self_attn.q_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.46.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.46.self_attn.v_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.46.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.input_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.self_attn.k_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.47.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.self_attn.q_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.47.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.47.self_attn.v_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.47.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.48.input_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.48.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.48.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.48.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.48.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.48.self_attn.k_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.48.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.48.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.48.self_attn.q_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.48.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.48.self_attn.v_proj.bias": "model-00010-of-00014.safetensors",
+    "model.layers.48.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
+    "model.layers.49.input_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.49.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.49.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.49.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.49.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.49.self_attn.k_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.49.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.49.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.49.self_attn.q_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.49.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.49.self_attn.v_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.49.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.50.input_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.50.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.50.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.50.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.50.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.50.self_attn.k_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.50.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.50.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.50.self_attn.q_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.50.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.50.self_attn.v_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.50.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.input_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.self_attn.k_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.51.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.self_attn.q_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.51.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.51.self_attn.v_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.51.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.input_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.self_attn.k_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.52.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.self_attn.q_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.52.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.52.self_attn.v_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.52.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.53.input_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.53.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.53.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.53.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.53.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.53.self_attn.k_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.53.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.53.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.53.self_attn.q_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.53.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.53.self_attn.v_proj.bias": "model-00011-of-00014.safetensors",
+    "model.layers.53.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
+    "model.layers.54.input_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.54.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.54.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.54.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.54.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.54.self_attn.k_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.54.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.54.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.54.self_attn.q_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.54.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.54.self_attn.v_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.54.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.input_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.self_attn.k_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.55.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.self_attn.q_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.55.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.55.self_attn.v_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.55.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.input_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.self_attn.k_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.56.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.self_attn.q_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.56.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.56.self_attn.v_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.56.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.input_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.self_attn.k_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.57.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.self_attn.q_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.57.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.57.self_attn.v_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.57.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.58.input_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.58.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.58.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.58.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.58.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.58.self_attn.k_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.58.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.58.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.58.self_attn.q_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.58.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.58.self_attn.v_proj.bias": "model-00012-of-00014.safetensors",
+    "model.layers.58.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
+    "model.layers.59.input_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.59.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.59.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.59.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.59.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.59.self_attn.k_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.59.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.59.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.59.self_attn.q_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.59.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.59.self_attn.v_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.59.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.60.input_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.60.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.60.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.60.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.60.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.60.self_attn.k_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.60.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.60.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.60.self_attn.q_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.60.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.60.self_attn.v_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.60.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.input_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.self_attn.k_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.61.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.self_attn.q_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.61.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.61.self_attn.v_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.61.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.input_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.self_attn.k_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.62.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.self_attn.q_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.62.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.62.self_attn.v_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.62.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.63.input_layernorm.weight": "model-00014-of-00014.safetensors",
+    "model.layers.63.mlp.down_proj.weight": "model-00014-of-00014.safetensors",
+    "model.layers.63.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.63.mlp.up_proj.weight": "model-00014-of-00014.safetensors",
+    "model.layers.63.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
+    "model.layers.63.self_attn.k_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.63.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.63.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.63.self_attn.q_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.63.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.63.self_attn.v_proj.bias": "model-00013-of-00014.safetensors",
+    "model.layers.63.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
+    "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00014.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
+    "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
+    "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00014.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
+    "model.norm.weight": "model-00014-of-00014.safetensors"
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e20ddafc659ba90242154b55275402edeca0715e5dbb30f56815a4ce081f4893
+size 11422778

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,197 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<｜end▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<｜User｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151645": {
+      "content": "<｜Assistant｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151646": {
+      "content": "<｜begin▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|EOT|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151648": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151649": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin��>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<｜end▁of▁sentence｜>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 20000,
+  "pad_token": "<｜end▁of▁sentence｜>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "split_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": null,
+  "use_default_system_prompt": false
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,1050 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 9.654545454545454,
+  "eval_steps": 30,
+  "global_step": 270,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.07272727272727272,
+      "grad_norm": 0.42279353737831116,
+      "learning_rate": 3.3333333333333333e-06,
+      "loss": 0.4041,
+      "step": 2
+    },
+    {
+      "epoch": 0.14545454545454545,
+      "grad_norm": 0.39341622591018677,
+      "learning_rate": 4.999826945767665e-06,
+      "loss": 0.4252,
+      "step": 4
+    },
+    {
+      "epoch": 0.21818181818181817,
+      "grad_norm": 0.28936225175857544,
+      "learning_rate": 4.998442655654946e-06,
+      "loss": 0.4065,
+      "step": 6
+    },
+    {
+      "epoch": 0.2909090909090909,
+      "grad_norm": 0.1611073613166809,
+      "learning_rate": 4.995674841986217e-06,
+      "loss": 0.3861,
+      "step": 8
+    },
+    {
+      "epoch": 0.36363636363636365,
+      "grad_norm": 0.1466752290725708,
+      "learning_rate": 4.991525037450412e-06,
+      "loss": 0.3941,
+      "step": 10
+    },
+    {
+      "epoch": 0.43636363636363634,
+      "grad_norm": 0.23374205827713013,
+      "learning_rate": 4.985995540019956e-06,
+      "loss": 0.385,
+      "step": 12
+    },
+    {
+      "epoch": 0.509090909090909,
+      "grad_norm": 0.2576417326927185,
+      "learning_rate": 4.979089411678252e-06,
+      "loss": 0.3865,
+      "step": 14
+    },
+    {
+      "epoch": 0.5818181818181818,
+      "grad_norm": 0.2165990173816681,
+      "learning_rate": 4.970810476724097e-06,
+      "loss": 0.3914,
+      "step": 16
+    },
+    {
+      "epoch": 0.6545454545454545,
+      "grad_norm": 0.13620640337467194,
+      "learning_rate": 4.961163319653959e-06,
+      "loss": 0.3682,
+      "step": 18
+    },
+    {
+      "epoch": 0.7272727272727273,
+      "grad_norm": 0.09898896515369415,
+      "learning_rate": 4.950153282623289e-06,
+      "loss": 0.3763,
+      "step": 20
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 0.08760473132133484,
+      "learning_rate": 4.937786462488284e-06,
+      "loss": 0.3793,
+      "step": 22
+    },
+    {
+      "epoch": 0.8727272727272727,
+      "grad_norm": 0.10171981155872345,
+      "learning_rate": 4.9240697074297205e-06,
+      "loss": 0.3823,
+      "step": 24
+    },
+    {
+      "epoch": 0.9454545454545454,
+      "grad_norm": 0.10795366764068604,
+      "learning_rate": 4.909010613160751e-06,
+      "loss": 0.3805,
+      "step": 26
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 0.08594109863042831,
+      "learning_rate": 4.892617518720737e-06,
+      "loss": 0.3662,
+      "step": 28
+    },
+    {
+      "epoch": 1.0727272727272728,
+      "grad_norm": 0.08396578580141068,
+      "learning_rate": 4.874899501857477e-06,
+      "loss": 0.368,
+      "step": 30
+    },
+    {
+      "epoch": 1.0727272727272728,
+      "eval_loss": 0.40457433462142944,
+      "eval_runtime": 16.4893,
+      "eval_samples_per_second": 6.065,
+      "eval_steps_per_second": 0.425,
+      "step": 30
+    },
+    {
+      "epoch": 1.1454545454545455,
+      "grad_norm": 0.06160604581236839,
+      "learning_rate": 4.85586637400036e-06,
+      "loss": 0.3608,
+      "step": 32
+    },
+    {
+      "epoch": 1.2181818181818183,
+      "grad_norm": 0.053206440061330795,
+      "learning_rate": 4.8355286748272405e-06,
+      "loss": 0.3423,
+      "step": 34
+    },
+    {
+      "epoch": 1.290909090909091,
+      "grad_norm": 0.05600437521934509,
+      "learning_rate": 4.813897666428054e-06,
+      "loss": 0.349,
+      "step": 36
+    },
+    {
+      "epoch": 1.3636363636363638,
+      "grad_norm": 0.06277399510145187,
+      "learning_rate": 4.790985327068376e-06,
+      "loss": 0.3569,
+      "step": 38
+    },
+    {
+      "epoch": 1.4363636363636363,
+      "grad_norm": 0.0717867836356163,
+      "learning_rate": 4.766804344556414e-06,
+      "loss": 0.359,
+      "step": 40
+    },
+    {
+      "epoch": 1.509090909090909,
+      "grad_norm": 0.051569655537605286,
+      "learning_rate": 4.741368109217072e-06,
+      "loss": 0.3489,
+      "step": 42
+    },
+    {
+      "epoch": 1.5818181818181818,
+      "grad_norm": 0.04518551751971245,
+      "learning_rate": 4.714690706477e-06,
+      "loss": 0.3382,
+      "step": 44
+    },
+    {
+      "epoch": 1.6545454545454545,
+      "grad_norm": 0.0480523444712162,
+      "learning_rate": 4.68678690906473e-06,
+      "loss": 0.3412,
+      "step": 46
+    },
+    {
+      "epoch": 1.7272727272727273,
+      "grad_norm": 0.04666323587298393,
+      "learning_rate": 4.657672168830211e-06,
+      "loss": 0.3536,
+      "step": 48
+    },
+    {
+      "epoch": 1.8,
+      "grad_norm": 0.05208882689476013,
+      "learning_rate": 4.627362608188281e-06,
+      "loss": 0.3483,
+      "step": 50
+    },
+    {
+      "epoch": 1.8727272727272726,
+      "grad_norm": 0.05304299667477608,
+      "learning_rate": 4.5958750111908065e-06,
+      "loss": 0.3408,
+      "step": 52
+    },
+    {
+      "epoch": 1.9454545454545453,
+      "grad_norm": 0.04273353889584541,
+      "learning_rate": 4.563226814232444e-06,
+      "loss": 0.3728,
+      "step": 54
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 0.08124157786369324,
+      "learning_rate": 4.529436096395157e-06,
+      "loss": 0.3783,
+      "step": 56
+    },
+    {
+      "epoch": 2.0727272727272728,
+      "grad_norm": 0.044167324900627136,
+      "learning_rate": 4.494521569436845e-06,
+      "loss": 0.3326,
+      "step": 58
+    },
+    {
+      "epoch": 2.1454545454545455,
+      "grad_norm": 0.041194308549165726,
+      "learning_rate": 4.4585025674296315e-06,
+      "loss": 0.3302,
+      "step": 60
+    },
+    {
+      "epoch": 2.1454545454545455,
+      "eval_loss": 0.4037678837776184,
+      "eval_runtime": 15.9657,
+      "eval_samples_per_second": 6.263,
+      "eval_steps_per_second": 0.438,
+      "step": 60
+    },
+    {
+      "epoch": 2.2181818181818183,
+      "grad_norm": 0.050775595009326935,
+      "learning_rate": 4.4213990360535274e-06,
+      "loss": 0.3303,
+      "step": 62
+    },
+    {
+      "epoch": 2.290909090909091,
+      "grad_norm": 0.03840049356222153,
+      "learning_rate": 4.383231521551432e-06,
+      "loss": 0.3142,
+      "step": 64
+    },
+    {
+      "epoch": 2.3636363636363638,
+      "grad_norm": 0.04364444687962532,
+      "learning_rate": 4.3440211593515556e-06,
+      "loss": 0.3329,
+      "step": 66
+    },
+    {
+      "epoch": 2.4363636363636365,
+      "grad_norm": 0.040008459240198135,
+      "learning_rate": 4.303789662363587e-06,
+      "loss": 0.3426,
+      "step": 68
+    },
+    {
+      "epoch": 2.509090909090909,
+      "grad_norm": 0.04201684519648552,
+      "learning_rate": 4.262559308955072e-06,
+      "loss": 0.3439,
+      "step": 70
+    },
+    {
+      "epoch": 2.581818181818182,
+      "grad_norm": 0.04128064960241318,
+      "learning_rate": 4.220352930614672e-06,
+      "loss": 0.3391,
+      "step": 72
+    },
+    {
+      "epoch": 2.6545454545454543,
+      "grad_norm": 0.03866910934448242,
+      "learning_rate": 4.177193899309127e-06,
+      "loss": 0.3299,
+      "step": 74
+    },
+    {
+      "epoch": 2.7272727272727275,
+      "grad_norm": 0.0435042642056942,
+      "learning_rate": 4.133106114540923e-06,
+      "loss": 0.323,
+      "step": 76
+    },
+    {
+      "epoch": 2.8,
+      "grad_norm": 0.03538796305656433,
+      "learning_rate": 4.088113990113846e-06,
+      "loss": 0.322,
+      "step": 78
+    },
+    {
+      "epoch": 2.8727272727272726,
+      "grad_norm": 0.04087506979703903,
+      "learning_rate": 4.042242440613724e-06,
+      "loss": 0.3247,
+      "step": 80
+    },
+    {
+      "epoch": 2.9454545454545453,
+      "grad_norm": 0.046281322836875916,
+      "learning_rate": 3.995516867611865e-06,
+      "loss": 0.3438,
+      "step": 82
+    },
+    {
+      "epoch": 3.0,
+      "grad_norm": 0.04055408388376236,
+      "learning_rate": 3.947963145598833e-06,
+      "loss": 0.3276,
+      "step": 84
+    },
+    {
+      "epoch": 3.0727272727272728,
+      "grad_norm": 0.041513592004776,
+      "learning_rate": 3.899607607656334e-06,
+      "loss": 0.3252,
+      "step": 86
+    },
+    {
+      "epoch": 3.1454545454545455,
+      "grad_norm": 0.03839336708188057,
+      "learning_rate": 3.850477030875147e-06,
+      "loss": 0.3115,
+      "step": 88
+    },
+    {
+      "epoch": 3.2181818181818183,
+      "grad_norm": 0.04176229238510132,
+      "learning_rate": 3.8005986215272056e-06,
+      "loss": 0.309,
+      "step": 90
+    },
+    {
+      "epoch": 3.2181818181818183,
+      "eval_loss": 0.40795081853866577,
+      "eval_runtime": 15.995,
+      "eval_samples_per_second": 6.252,
+      "eval_steps_per_second": 0.438,
+      "step": 90
+    },
+    {
+      "epoch": 3.290909090909091,
+      "grad_norm": 0.04411574825644493,
+      "learning_rate": 3.7500000000000005e-06,
+      "loss": 0.3077,
+      "step": 92
+    },
+    {
+      "epoch": 3.3636363636363638,
+      "grad_norm": 0.038224343210458755,
+      "learning_rate": 3.6987091855016667e-06,
+      "loss": 0.2897,
+      "step": 94
+    },
+    {
+      "epoch": 3.4363636363636365,
+      "grad_norm": 0.048621904104948044,
+      "learning_rate": 3.6467545805452266e-06,
+      "loss": 0.3019,
+      "step": 96
+    },
+    {
+      "epoch": 3.509090909090909,
+      "grad_norm": 0.041004959493875504,
+      "learning_rate": 3.594164955220577e-06,
+      "loss": 0.3105,
+      "step": 98
+    },
+    {
+      "epoch": 3.581818181818182,
+      "grad_norm": 0.040131937712430954,
+      "learning_rate": 3.5409694312629193e-06,
+      "loss": 0.3201,
+      "step": 100
+    },
+    {
+      "epoch": 3.6545454545454543,
+      "grad_norm": 0.04099489003419876,
+      "learning_rate": 3.4871974659264786e-06,
+      "loss": 0.3149,
+      "step": 102
+    },
+    {
+      "epoch": 3.7272727272727275,
+      "grad_norm": 0.04340096190571785,
+      "learning_rate": 3.4328788356724135e-06,
+      "loss": 0.3044,
+      "step": 104
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 0.04188128933310509,
+      "learning_rate": 3.378043619679974e-06,
+      "loss": 0.3103,
+      "step": 106
+    },
+    {
+      "epoch": 3.8727272727272726,
+      "grad_norm": 0.037973225116729736,
+      "learning_rate": 3.322722183190025e-06,
+      "loss": 0.2999,
+      "step": 108
+    },
+    {
+      "epoch": 3.9454545454545453,
+      "grad_norm": 0.038421131670475006,
+      "learning_rate": 3.26694516069016e-06,
+      "loss": 0.312,
+      "step": 110
+    },
+    {
+      "epoch": 4.0,
+      "grad_norm": 0.06082421541213989,
+      "learning_rate": 3.210743438950718e-06,
+      "loss": 0.276,
+      "step": 112
+    },
+    {
+      "epoch": 4.072727272727272,
+      "grad_norm": 0.037277910858392715,
+      "learning_rate": 3.154148139921102e-06,
+      "loss": 0.2977,
+      "step": 114
+    },
+    {
+      "epoch": 4.1454545454545455,
+      "grad_norm": 0.037613365799188614,
+      "learning_rate": 3.0971906034958616e-06,
+      "loss": 0.3004,
+      "step": 116
+    },
+    {
+      "epoch": 4.218181818181818,
+      "grad_norm": 0.03691912069916725,
+      "learning_rate": 3.0399023701600903e-06,
+      "loss": 0.2673,
+      "step": 118
+    },
+    {
+      "epoch": 4.290909090909091,
+      "grad_norm": 0.03884879872202873,
+      "learning_rate": 2.9823151635237424e-06,
+      "loss": 0.2901,
+      "step": 120
+    },
+    {
+      "epoch": 4.290909090909091,
+      "eval_loss": 0.41295385360717773,
+      "eval_runtime": 15.9766,
+      "eval_samples_per_second": 6.259,
+      "eval_steps_per_second": 0.438,
+      "step": 120
+    },
+    {
+      "epoch": 4.363636363636363,
+      "grad_norm": 0.040829263627529144,
+      "learning_rate": 2.924460872754547e-06,
+      "loss": 0.2902,
+      "step": 122
+    },
+    {
+      "epoch": 4.4363636363636365,
+      "grad_norm": 0.04225178807973862,
+      "learning_rate": 2.8663715349192388e-06,
+      "loss": 0.2951,
+      "step": 124
+    },
+    {
+      "epoch": 4.509090909090909,
+      "grad_norm": 0.036058977246284485,
+      "learning_rate": 2.8080793172428965e-06,
+      "loss": 0.2919,
+      "step": 126
+    },
+    {
+      "epoch": 4.581818181818182,
+      "grad_norm": 0.03855273872613907,
+      "learning_rate": 2.7496164992961995e-06,
+      "loss": 0.2897,
+      "step": 128
+    },
+    {
+      "epoch": 4.654545454545454,
+      "grad_norm": 0.039659108966588974,
+      "learning_rate": 2.691015455120468e-06,
+      "loss": 0.2932,
+      "step": 130
+    },
+    {
+      "epoch": 4.7272727272727275,
+      "grad_norm": 0.042663708329200745,
+      "learning_rate": 2.6323086353004077e-06,
+      "loss": 0.2725,
+      "step": 132
+    },
+    {
+      "epoch": 4.8,
+      "grad_norm": 0.03936437889933586,
+      "learning_rate": 2.573528548994449e-06,
+      "loss": 0.2939,
+      "step": 134
+    },
+    {
+      "epoch": 4.872727272727273,
+      "grad_norm": 0.0572555810213089,
+      "learning_rate": 2.5147077459326556e-06,
+      "loss": 0.2776,
+      "step": 136
+    },
+    {
+      "epoch": 4.945454545454545,
+      "grad_norm": 0.0380295105278492,
+      "learning_rate": 2.455878798392179e-06,
+      "loss": 0.2979,
+      "step": 138
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 0.035944413393735886,
+      "learning_rate": 2.397074283160206e-06,
+      "loss": 0.2715,
+      "step": 140
+    },
+    {
+      "epoch": 5.072727272727272,
+      "grad_norm": 0.0400959774851799,
+      "learning_rate": 2.338326763494429e-06,
+      "loss": 0.2746,
+      "step": 142
+    },
+    {
+      "epoch": 5.1454545454545455,
+      "grad_norm": 0.047982338815927505,
+      "learning_rate": 2.2796687710909966e-06,
+      "loss": 0.2705,
+      "step": 144
+    },
+    {
+      "epoch": 5.218181818181818,
+      "grad_norm": 0.04013433679938316,
+      "learning_rate": 2.2211327880699392e-06,
+      "loss": 0.2549,
+      "step": 146
+    },
+    {
+      "epoch": 5.290909090909091,
+      "grad_norm": 0.0582863911986351,
+      "learning_rate": 2.162751228988063e-06,
+      "loss": 0.2731,
+      "step": 148
+    },
+    {
+      "epoch": 5.363636363636363,
+      "grad_norm": 0.03681691363453865,
+      "learning_rate": 2.1045564228892404e-06,
+      "loss": 0.2705,
+      "step": 150
+    },
+    {
+      "epoch": 5.363636363636363,
+      "eval_loss": 0.41707590222358704,
+      "eval_runtime": 15.9844,
+      "eval_samples_per_second": 6.256,
+      "eval_steps_per_second": 0.438,
+      "step": 150
+    },
+    {
+      "epoch": 5.4363636363636365,
+      "grad_norm": 0.040616922080516815,
+      "learning_rate": 2.04658059540206e-06,
+      "loss": 0.2603,
+      "step": 152
+    },
+    {
+      "epoch": 5.509090909090909,
+      "grad_norm": 0.04648638516664505,
+      "learning_rate": 1.9888558508947496e-06,
+      "loss": 0.2665,
+      "step": 154
+    },
+    {
+      "epoch": 5.581818181818182,
+      "grad_norm": 0.03560300171375275,
+      "learning_rate": 1.9314141546972345e-06,
+      "loss": 0.2682,
+      "step": 156
+    },
+    {
+      "epoch": 5.654545454545454,
+      "grad_norm": 0.03742830082774162,
+      "learning_rate": 1.8742873154002007e-06,
+      "loss": 0.2726,
+      "step": 158
+    },
+    {
+      "epoch": 5.7272727272727275,
+      "grad_norm": 0.03804292157292366,
+      "learning_rate": 1.8175069672409476e-06,
+      "loss": 0.2581,
+      "step": 160
+    },
+    {
+      "epoch": 5.8,
+      "grad_norm": 0.03460890054702759,
+      "learning_rate": 1.7611045525857902e-06,
+      "loss": 0.2834,
+      "step": 162
+    },
+    {
+      "epoch": 5.872727272727273,
+      "grad_norm": 0.038920674473047256,
+      "learning_rate": 1.7051113045187123e-06,
+      "loss": 0.2746,
+      "step": 164
+    },
+    {
+      "epoch": 5.945454545454545,
+      "grad_norm": 0.04531797394156456,
+      "learning_rate": 1.6495582295459081e-06,
+      "loss": 0.2666,
+      "step": 166
+    },
+    {
+      "epoch": 6.0,
+      "grad_norm": 0.06146521493792534,
+      "learning_rate": 1.5944760904257944e-06,
+      "loss": 0.2675,
+      "step": 168
+    },
+    {
+      "epoch": 6.072727272727272,
+      "grad_norm": 0.0393206886947155,
+      "learning_rate": 1.5398953891339972e-06,
+      "loss": 0.2619,
+      "step": 170
+    },
+    {
+      "epoch": 6.1454545454545455,
+      "grad_norm": 0.03849950432777405,
+      "learning_rate": 1.485846349972751e-06,
+      "loss": 0.2582,
+      "step": 172
+    },
+    {
+      "epoch": 6.218181818181818,
+      "grad_norm": 0.03983594849705696,
+      "learning_rate": 1.4323589028340598e-06,
+      "loss": 0.2736,
+      "step": 174
+    },
+    {
+      "epoch": 6.290909090909091,
+      "grad_norm": 0.03905485197901726,
+      "learning_rate": 1.3794626666258868e-06,
+      "loss": 0.2428,
+      "step": 176
+    },
+    {
+      "epoch": 6.363636363636363,
+      "grad_norm": 0.04092865809798241,
+      "learning_rate": 1.3271869328705517e-06,
+      "loss": 0.2555,
+      "step": 178
+    },
+    {
+      "epoch": 6.4363636363636365,
+      "grad_norm": 0.036916088312864304,
+      "learning_rate": 1.2755606494844294e-06,
+      "loss": 0.2615,
+      "step": 180
+    },
+    {
+      "epoch": 6.4363636363636365,
+      "eval_loss": 0.4211080074310303,
+      "eval_runtime": 15.9838,
+      "eval_samples_per_second": 6.256,
+      "eval_steps_per_second": 0.438,
+      "step": 180
+    },
+    {
+      "epoch": 6.509090909090909,
+      "grad_norm": 0.03942929953336716,
+      "learning_rate": 1.2246124047479074e-06,
+      "loss": 0.2553,
+      "step": 182
+    },
+    {
+      "epoch": 6.581818181818182,
+      "grad_norm": 0.038776326924562454,
+      "learning_rate": 1.174370411474503e-06,
+      "loss": 0.2523,
+      "step": 184
+    },
+    {
+      "epoch": 6.654545454545454,
+      "grad_norm": 0.039575546979904175,
+      "learning_rate": 1.1248624913878966e-06,
+      "loss": 0.2616,
+      "step": 186
+    },
+    {
+      "epoch": 6.7272727272727275,
+      "grad_norm": 0.03467780724167824,
+      "learning_rate": 1.0761160597155288e-06,
+      "loss": 0.2675,
+      "step": 188
+    },
+    {
+      "epoch": 6.8,
+      "grad_norm": 0.036917053163051605,
+      "learning_rate": 1.028158110007294e-06,
+      "loss": 0.268,
+      "step": 190
+    },
+    {
+      "epoch": 6.872727272727273,
+      "grad_norm": 0.03764381632208824,
+      "learning_rate": 9.81015199187753e-07,
+      "loss": 0.2618,
+      "step": 192
+    },
+    {
+      "epoch": 6.945454545454545,
+      "grad_norm": 0.03786522150039673,
+      "learning_rate": 9.347134328501098e-07,
+      "loss": 0.2468,
+      "step": 194
+    },
+    {
+      "epoch": 7.0,
+      "grad_norm": 0.03737010434269905,
+      "learning_rate": 8.892784508001343e-07,
+      "loss": 0.2299,
+      "step": 196
+    },
+    {
+      "epoch": 7.072727272727272,
+      "grad_norm": 0.04024997726082802,
+      "learning_rate": 8.44735412857999e-07,
+      "loss": 0.2625,
+      "step": 198
+    },
+    {
+      "epoch": 7.1454545454545455,
+      "grad_norm": 0.042676106095314026,
+      "learning_rate": 8.011089849259263e-07,
+      "loss": 0.2615,
+      "step": 200
+    },
+    {
+      "epoch": 7.218181818181818,
+      "grad_norm": 0.03763395920395851,
+      "learning_rate": 7.584233253293327e-07,
+      "loss": 0.2476,
+      "step": 202
+    },
+    {
+      "epoch": 7.290909090909091,
+      "grad_norm": 0.03564968332648277,
+      "learning_rate": 7.167020714390502e-07,
+      "loss": 0.2453,
+      "step": 204
+    },
+    {
+      "epoch": 7.363636363636363,
+      "grad_norm": 0.03487633168697357,
+      "learning_rate": 6.759683265820294e-07,
+      "loss": 0.2557,
+      "step": 206
+    },
+    {
+      "epoch": 7.4363636363636365,
+      "grad_norm": 0.038702890276908875,
+      "learning_rate": 6.36244647247774e-07,
+      "loss": 0.2537,
+      "step": 208
+    },
+    {
+      "epoch": 7.509090909090909,
+      "grad_norm": 0.03509435057640076,
+      "learning_rate": 5.975530305975808e-07,
+      "loss": 0.2417,
+      "step": 210
+    },
+    {
+      "epoch": 7.509090909090909,
+      "eval_loss": 0.4235740005970001,
+      "eval_runtime": 15.9573,
+      "eval_samples_per_second": 6.267,
+      "eval_steps_per_second": 0.439,
+      "step": 210
+    },
+    {
+      "epoch": 7.581818181818182,
+      "grad_norm": 0.03575809672474861,
+      "learning_rate": 5.599149022835201e-07,
+      "loss": 0.2521,
+      "step": 212
+    },
+    {
+      "epoch": 7.654545454545454,
+      "grad_norm": 0.038031384348869324,
+      "learning_rate": 5.233511045838846e-07,
+      "loss": 0.2645,
+      "step": 214
+    },
+    {
+      "epoch": 7.7272727272727275,
+      "grad_norm": 0.03550686687231064,
+      "learning_rate": 4.878818848616861e-07,
+      "loss": 0.2416,
+      "step": 216
+    },
+    {
+      "epoch": 7.8,
+      "grad_norm": 0.035883497446775436,
+      "learning_rate": 4.5352688435259084e-07,
+      "loss": 0.2523,
+      "step": 218
+    },
+    {
+      "epoch": 7.872727272727273,
+      "grad_norm": 0.039348237216472626,
+      "learning_rate": 4.2030512728849946e-07,
+      "loss": 0.2476,
+      "step": 220
+    },
+    {
+      "epoch": 7.945454545454545,
+      "grad_norm": 0.03574049472808838,
+      "learning_rate": 3.882350103627952e-07,
+      "loss": 0.2514,
+      "step": 222
+    },
+    {
+      "epoch": 8.0,
+      "grad_norm": 0.05936720594763756,
+      "learning_rate": 3.5733429254309253e-07,
+      "loss": 0.241,
+      "step": 224
+    },
+    {
+      "epoch": 8.072727272727272,
+      "grad_norm": 0.03717917203903198,
+      "learning_rate": 3.276200852371339e-07,
+      "loss": 0.2488,
+      "step": 226
+    },
+    {
+      "epoch": 8.145454545454545,
+      "grad_norm": 0.03638555109500885,
+      "learning_rate": 2.9910884281727225e-07,
+      "loss": 0.2479,
+      "step": 228
+    },
+    {
+      "epoch": 8.218181818181819,
+      "grad_norm": 0.03493395075201988,
+      "learning_rate": 2.7181635350878645e-07,
+      "loss": 0.2398,
+      "step": 230
+    },
+    {
+      "epoch": 8.290909090909091,
+      "grad_norm": 0.0358763113617897,
+      "learning_rate": 2.4575773064708904e-07,
+      "loss": 0.2537,
+      "step": 232
+    },
+    {
+      "epoch": 8.363636363636363,
+      "grad_norm": 0.03697650507092476,
+      "learning_rate": 2.2094740430864569e-07,
+      "loss": 0.237,
+      "step": 234
+    },
+    {
+      "epoch": 8.436363636363636,
+      "grad_norm": 0.03936561197042465,
+      "learning_rate": 1.9739911332025796e-07,
+      "loss": 0.2358,
+      "step": 236
+    },
+    {
+      "epoch": 8.50909090909091,
+      "grad_norm": 0.03290196508169174,
+      "learning_rate": 1.7512589765112998e-07,
+      "loss": 0.2338,
+      "step": 238
+    },
+    {
+      "epoch": 8.581818181818182,
+      "grad_norm": 0.0346793606877327,
+      "learning_rate": 1.5414009119192635e-07,
+      "loss": 0.2569,
+      "step": 240
+    },
+    {
+      "epoch": 8.581818181818182,
+      "eval_loss": 0.42472872138023376,
+      "eval_runtime": 15.9951,
+      "eval_samples_per_second": 6.252,
+      "eval_steps_per_second": 0.438,
+      "step": 240
+    },
+    {
+      "epoch": 8.654545454545454,
+      "grad_norm": 0.040124256163835526,
+      "learning_rate": 1.3445331492482617e-07,
+      "loss": 0.2667,
+      "step": 242
+    },
+    {
+      "epoch": 8.727272727272727,
+      "grad_norm": 0.03517503663897514,
+      "learning_rate": 1.1607647048835463e-07,
+      "loss": 0.2355,
+      "step": 244
+    },
+    {
+      "epoch": 8.8,
+      "grad_norm": 0.038074642419815063,
+      "learning_rate": 9.901973414055188e-08,
+      "loss": 0.2439,
+      "step": 246
+    },
+    {
+      "epoch": 8.872727272727273,
+      "grad_norm": 0.03690966218709946,
+      "learning_rate": 8.329255112382666e-08,
+      "loss": 0.2487,
+      "step": 248
+    },
+    {
+      "epoch": 8.945454545454545,
+      "grad_norm": 0.03566785156726837,
+      "learning_rate": 6.890363043461051e-08,
+      "loss": 0.252,
+      "step": 250
+    },
+    {
+      "epoch": 9.0,
+      "grad_norm": 0.0351359024643898,
+      "learning_rate": 5.5860940000714016e-08,
+      "loss": 0.2637,
+      "step": 252
+    },
+    {
+      "epoch": 9.072727272727272,
+      "grad_norm": 0.03571401536464691,
+      "learning_rate": 4.4171702269051874e-08,
+      "loss": 0.2386,
+      "step": 254
+    },
+    {
+      "epoch": 9.145454545454545,
+      "grad_norm": 0.04017382860183716,
+      "learning_rate": 3.3842390206180186e-08,
+      "loss": 0.2527,
+      "step": 256
+    },
+    {
+      "epoch": 9.218181818181819,
+      "grad_norm": 0.035146769136190414,
+      "learning_rate": 2.487872371386424e-08,
+      "loss": 0.2461,
+      "step": 258
+    },
+    {
+      "epoch": 9.290909090909091,
+      "grad_norm": 0.03707621991634369,
+      "learning_rate": 1.728566646165747e-08,
+      "loss": 0.2537,
+      "step": 260
+    },
+    {
+      "epoch": 9.363636363636363,
+      "grad_norm": 0.03554265946149826,
+      "learning_rate": 1.1067423138247103e-08,
+      "loss": 0.2353,
+      "step": 262
+    },
+    {
+      "epoch": 9.436363636363636,
+      "grad_norm": 0.03387082368135452,
+      "learning_rate": 6.2274371230905405e-09,
+      "loss": 0.2422,
+      "step": 264
+    },
+    {
+      "epoch": 9.50909090909091,
+      "grad_norm": 0.03694632276892662,
+      "learning_rate": 2.7683885796273014e-09,
+      "loss": 0.2448,
+      "step": 266
+    },
+    {
+      "epoch": 9.581818181818182,
+      "grad_norm": 0.03442605957388878,
+      "learning_rate": 6.921929711287134e-10,
+      "loss": 0.2482,
+      "step": 268
+    },
+    {
+      "epoch": 9.654545454545454,
+      "grad_norm": 0.03988894447684288,
+      "learning_rate": 0.0,
+      "loss": 0.2565,
+      "step": 270
+    },
+    {
+      "epoch": 9.654545454545454,
+      "eval_loss": 0.42475754022598267,
+      "eval_runtime": 15.9878,
+      "eval_samples_per_second": 6.255,
+      "eval_steps_per_second": 0.438,
+      "step": 270
+    }
+  ],
+  "logging_steps": 2,
+  "max_steps": 270,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 10,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.17181948959916e+17,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}