NYTK
/

PULI-HuBA-mamba-130M

Text Generation

Model card Files Files and versions Community

GaborMadarasz commited on 20 days ago

Commit

1c6afb5

·

verified ·

1 Parent(s): 3a3dd16

Update README.md

Files changed (1) hide show

README.md +66 -3

README.md CHANGED Viewed

@@ -1,3 +1,66 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# PULI-HuBA130M
+PULI-HuBA130M is a monolingual Hungarian foundation model based on the Mamba configuration.
+(https://huggingface.co/state-spaces/mamba-130m-hf)
+Parameters:
+MambaForCausalLM(
+  (backbone): MambaModel(
+    (embeddings): Embedding(52000, 768)
+    (layers): ModuleList(
+      (0-23): 24 x MambaBlock(
+        (norm): MambaRMSNorm(768, eps=1e-05)
+        (mixer): MambaMixer(
+          (conv1d): Conv1d(1536, 1536, kernel_size=(4,), stride=(1,), padding=(3,), groups=1536)
+          (act): SiLU()
+          (in_proj): Linear(in_features=768, out_features=3072, bias=False)
+          (x_proj): Linear(in_features=1536, out_features=80, bias=False)
+          (dt_proj): Linear(in_features=48, out_features=1536, bias=True)
+          (out_proj): Linear(in_features=1536, out_features=768, bias=False)
+        )
+      )
+    )
+    (norm_f): MambaRMSNorm(768, eps=1e-05)
+  )
+  (lm_head): Linear(in_features=768, out_features=52000, bias=False)
+)
+## Training Data (Pretraining)
+    Epoch 1: filtered_processed_oscar_hu
+        Toxic-filtered, deduplicated, semantically segmented dataset
+        Source: OSCAR dataset
+    Epoch 2: IM21
+        Combination of:
+            clean_index_dataset
+            MEK_dataset
+            digital_meteor_dataset
+            21_century_dataset
+## Training Details
+    License: Apache 2.0
+    Hardware: 4 × NVIDIA A100 (80GB) GPUs
+    Year of training: 2024
+    Input/output: Text only
+    Parameter count: 130 million
+    Available model size: Single variant
+    Data type: float32
+    Batch size: 10 per GPU
+    Learning rate: 3e-4
+        Reference: GitHub issue
+## Ethical Considerations
+Concerns:
+    Potential for biased, incorrect, or harmful content generation.