Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,66 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
# PULI-HuBA130M
|
6 |
+
|
7 |
+
PULI-HuBA130M is a monolingual Hungarian foundation model based on the Mamba configuration.
|
8 |
+
(https://huggingface.co/state-spaces/mamba-130m-hf)
|
9 |
+
|
10 |
+
Parameters:
|
11 |
+
MambaForCausalLM(
|
12 |
+
(backbone): MambaModel(
|
13 |
+
(embeddings): Embedding(52000, 768)
|
14 |
+
(layers): ModuleList(
|
15 |
+
(0-23): 24 x MambaBlock(
|
16 |
+
(norm): MambaRMSNorm(768, eps=1e-05)
|
17 |
+
(mixer): MambaMixer(
|
18 |
+
(conv1d): Conv1d(1536, 1536, kernel_size=(4,), stride=(1,), padding=(3,), groups=1536)
|
19 |
+
(act): SiLU()
|
20 |
+
(in_proj): Linear(in_features=768, out_features=3072, bias=False)
|
21 |
+
(x_proj): Linear(in_features=1536, out_features=80, bias=False)
|
22 |
+
(dt_proj): Linear(in_features=48, out_features=1536, bias=True)
|
23 |
+
(out_proj): Linear(in_features=1536, out_features=768, bias=False)
|
24 |
+
)
|
25 |
+
)
|
26 |
+
)
|
27 |
+
(norm_f): MambaRMSNorm(768, eps=1e-05)
|
28 |
+
)
|
29 |
+
(lm_head): Linear(in_features=768, out_features=52000, bias=False)
|
30 |
+
)
|
31 |
+
|
32 |
+
|
33 |
+
## Training Data (Pretraining)
|
34 |
+
|
35 |
+
Epoch 1: filtered_processed_oscar_hu
|
36 |
+
Toxic-filtered, deduplicated, semantically segmented dataset
|
37 |
+
Source: OSCAR dataset
|
38 |
+
|
39 |
+
Epoch 2: IM21
|
40 |
+
Combination of:
|
41 |
+
clean_index_dataset
|
42 |
+
MEK_dataset
|
43 |
+
digital_meteor_dataset
|
44 |
+
21_century_dataset
|
45 |
+
|
46 |
+
|
47 |
+
## Training Details
|
48 |
+
|
49 |
+
|
50 |
+
License: Apache 2.0
|
51 |
+
Hardware: 4 × NVIDIA A100 (80GB) GPUs
|
52 |
+
Year of training: 2024
|
53 |
+
Input/output: Text only
|
54 |
+
Parameter count: 130 million
|
55 |
+
Available model size: Single variant
|
56 |
+
Data type: float32
|
57 |
+
Batch size: 10 per GPU
|
58 |
+
Learning rate: 3e-4
|
59 |
+
Reference: GitHub issue
|
60 |
+
|
61 |
+
|
62 |
+
## Ethical Considerations
|
63 |
+
|
64 |
+
Concerns:
|
65 |
+
|
66 |
+
Potential for biased, incorrect, or harmful content generation.
|