GaborMadarasz commited on
Commit
1c6afb5
·
verified ·
1 Parent(s): 3a3dd16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,3 +1,66 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # PULI-HuBA130M
6
+
7
+ PULI-HuBA130M is a monolingual Hungarian foundation model based on the Mamba configuration.
8
+ (https://huggingface.co/state-spaces/mamba-130m-hf)
9
+
10
+ Parameters:
11
+ MambaForCausalLM(
12
+ (backbone): MambaModel(
13
+ (embeddings): Embedding(52000, 768)
14
+ (layers): ModuleList(
15
+ (0-23): 24 x MambaBlock(
16
+ (norm): MambaRMSNorm(768, eps=1e-05)
17
+ (mixer): MambaMixer(
18
+ (conv1d): Conv1d(1536, 1536, kernel_size=(4,), stride=(1,), padding=(3,), groups=1536)
19
+ (act): SiLU()
20
+ (in_proj): Linear(in_features=768, out_features=3072, bias=False)
21
+ (x_proj): Linear(in_features=1536, out_features=80, bias=False)
22
+ (dt_proj): Linear(in_features=48, out_features=1536, bias=True)
23
+ (out_proj): Linear(in_features=1536, out_features=768, bias=False)
24
+ )
25
+ )
26
+ )
27
+ (norm_f): MambaRMSNorm(768, eps=1e-05)
28
+ )
29
+ (lm_head): Linear(in_features=768, out_features=52000, bias=False)
30
+ )
31
+
32
+
33
+ ## Training Data (Pretraining)
34
+
35
+ Epoch 1: filtered_processed_oscar_hu
36
+ Toxic-filtered, deduplicated, semantically segmented dataset
37
+ Source: OSCAR dataset
38
+
39
+ Epoch 2: IM21
40
+ Combination of:
41
+ clean_index_dataset
42
+ MEK_dataset
43
+ digital_meteor_dataset
44
+ 21_century_dataset
45
+
46
+
47
+ ## Training Details
48
+
49
+
50
+ License: Apache 2.0
51
+ Hardware: 4 × NVIDIA A100 (80GB) GPUs
52
+ Year of training: 2024
53
+ Input/output: Text only
54
+ Parameter count: 130 million
55
+ Available model size: Single variant
56
+ Data type: float32
57
+ Batch size: 10 per GPU
58
+ Learning rate: 3e-4
59
+ Reference: GitHub issue
60
+
61
+
62
+ ## Ethical Considerations
63
+
64
+ Concerns:
65
+
66
+ Potential for biased, incorrect, or harmful content generation.