File size: 2,562 Bytes
1c6afb5
 
2ed1fb1
 
 
 
896b014
 
 
 
1c6afb5
 
716a824
1c6afb5
716a824
1c6afb5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
896b014
1c6afb5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff33f52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
033bd47
 
ff33f52
 
 
c47fb95
c500444
c47fb95
 
 
c500444
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: apache-2.0
language:
- hu
base_model:
- state-spaces/mamba-130m-hf
pipeline_tag: text-generation
tags:
- Transformers
- mamba
---

# PULI-HuBA 130M

PULI-HuBA 130M is a monolingual Hungarian foundation model based on the Mamba configuration.
(https://huggingface.co/state-spaces/mamba-130m-hf)

Parameters:
MambaForCausalLM(
  (backbone): MambaModel(
    (embeddings): Embedding(52000, 768)
    (layers): ModuleList(
      (0-23): 24 x MambaBlock(
        (norm): MambaRMSNorm(768, eps=1e-05)
        (mixer): MambaMixer(
          (conv1d): Conv1d(1536, 1536, kernel_size=(4,), stride=(1,), padding=(3,), groups=1536)
          (act): SiLU()
          (in_proj): Linear(in_features=768, out_features=3072, bias=False)
          (x_proj): Linear(in_features=1536, out_features=80, bias=False)
          (dt_proj): Linear(in_features=48, out_features=1536, bias=True)
          (out_proj): Linear(in_features=1536, out_features=768, bias=False)
        )
      )
    )
    (norm_f): MambaRMSNorm(768, eps=1e-05)
  )
  (lm_head): Linear(in_features=768, out_features=52000, bias=False)
)


## Training Data (Pretraining)

The model was trained on a ~3.48B-token, toxic-filtered, deduplicated, and semantically segmented dataset.

## Training Details


    License: Apache 2.0
    Hardware: 4 × NVIDIA A100 (80GB) GPUs
    Year of training: 2024
    Input/output: Text only
    Parameter count: 130 million
    Available model size: Single variant
    Data type: float32
    Batch size: 10 per GPU
    Learning rate: 3e-4
        Reference: GitHub issue


## Ethical Considerations

Concerns:

    Potential for biased, incorrect, or harmful content generation.



## **Usage Example**  

To generate text using this model with Hugging Face's `pipeline`, use the following Python code:  

```python
from transformers import pipeline

# Load the model
model_name = "NYTK/PULI-HuBA130M" 

# Initialize the text generation pipeline
generator = pipeline("text-generation", model=model_name)

# Generate text with recommended parameters
output = generator(
    "Az a tény, hogy anyanyelvem magyar, és magyarul beszélek, gondolkozom, írok, életem legnagyobb eseménye, melyhez nincs fogható.",  # Example prompt in Hungarian
    max_length=156,
    do_sample=True,
    repetition_penalty=1.35,
    temperature=0.2,
    top_k=100,
    top_p=0.99,
    truncation=True
)

# Print the generated text
print(output[0]["generated_text"])
```

# Contact

If you have any questions, please contact me: [email protected] or [email protected]