Erland
/

Llama-3.2-1B-JAX

@@ -5,12 +5,12 @@ tags:
 - flax
 - text-generation
 - transformers
-- meta-llama/Llama-3.2-1B
 ---
 # meta-llama/Llama-3.2-1B - JAX/Flax
-This repository contains the JAX/Flax version of the meta-llama/Llama-3.2-1B model, originally a PyTorch model from {original_model_org}. This conversion enables efficient inference and training on TPUs and GPUs using the JAX/Flax framework.
 ## Model Description
@@ -27,7 +27,7 @@ This model was converted from the original PyTorch implementation to JAX/Flax. T
 ### Important Note about `max_position_embeddings`
-During the conversion process, it was necessary to modify the `max_position_embeddings` parameter in the model's configuration. The original value of {original_max_pos_embed} led to out-of-memory (OOM) errors on the hardware used for conversion. To resolve this, `max_position_embeddings` was adjusted to {new_max_pos_embed}.
 **Implications of this change:**
@@ -205,7 +205,7 @@ The conversion process was performed on the following hardware configuration:
 *   **Transformers version:** 4.47.0
 *   **GPU:** NVIDIA A100-SXM4-40GB
-This conversion took approximately 130.21 seconds to complete.
 ## Usage

 - flax
 - text-generation
 - transformers
+- meta-llama/Llama-3.2-1B # Add the specific model name as a tag
 ---
 # meta-llama/Llama-3.2-1B - JAX/Flax
+This repository contains the JAX/Flax version of the meta-llama/Llama-3.2-1B model, originally a PyTorch model from meta-llama. This conversion enables efficient inference and training on TPUs and GPUs using the JAX/Flax framework.
 ## Model Description
 ### Important Note about `max_position_embeddings`
+During the conversion process, it was necessary to modify the `max_position_embeddings` parameter in the model's configuration. The original value of 131072 led to out-of-memory (OOM) errors on the hardware used for conversion. To resolve this, `max_position_embeddings` was adjusted to 32768.
 **Implications of this change:**
 *   **Transformers version:** 4.47.0
 *   **GPU:** NVIDIA A100-SXM4-40GB
+This conversion took approximately 52.90 seconds to complete.
 ## Usage