File size: 2,562 Bytes
1c6afb5 2ed1fb1 896b014 1c6afb5 716a824 1c6afb5 716a824 1c6afb5 896b014 1c6afb5 ff33f52 033bd47 ff33f52 c47fb95 c500444 c47fb95 c500444 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
---
license: apache-2.0
language:
- hu
base_model:
- state-spaces/mamba-130m-hf
pipeline_tag: text-generation
tags:
- Transformers
- mamba
---
# PULI-HuBA 130M
PULI-HuBA 130M is a monolingual Hungarian foundation model based on the Mamba configuration.
(https://huggingface.co/state-spaces/mamba-130m-hf)
Parameters:
MambaForCausalLM(
(backbone): MambaModel(
(embeddings): Embedding(52000, 768)
(layers): ModuleList(
(0-23): 24 x MambaBlock(
(norm): MambaRMSNorm(768, eps=1e-05)
(mixer): MambaMixer(
(conv1d): Conv1d(1536, 1536, kernel_size=(4,), stride=(1,), padding=(3,), groups=1536)
(act): SiLU()
(in_proj): Linear(in_features=768, out_features=3072, bias=False)
(x_proj): Linear(in_features=1536, out_features=80, bias=False)
(dt_proj): Linear(in_features=48, out_features=1536, bias=True)
(out_proj): Linear(in_features=1536, out_features=768, bias=False)
)
)
)
(norm_f): MambaRMSNorm(768, eps=1e-05)
)
(lm_head): Linear(in_features=768, out_features=52000, bias=False)
)
## Training Data (Pretraining)
The model was trained on a ~3.48B-token, toxic-filtered, deduplicated, and semantically segmented dataset.
## Training Details
License: Apache 2.0
Hardware: 4 × NVIDIA A100 (80GB) GPUs
Year of training: 2024
Input/output: Text only
Parameter count: 130 million
Available model size: Single variant
Data type: float32
Batch size: 10 per GPU
Learning rate: 3e-4
Reference: GitHub issue
## Ethical Considerations
Concerns:
Potential for biased, incorrect, or harmful content generation.
## **Usage Example**
To generate text using this model with Hugging Face's `pipeline`, use the following Python code:
```python
from transformers import pipeline
# Load the model
model_name = "NYTK/PULI-HuBA130M"
# Initialize the text generation pipeline
generator = pipeline("text-generation", model=model_name)
# Generate text with recommended parameters
output = generator(
"Az a tény, hogy anyanyelvem magyar, és magyarul beszélek, gondolkozom, írok, életem legnagyobb eseménye, melyhez nincs fogható.", # Example prompt in Hungarian
max_length=156,
do_sample=True,
repetition_penalty=1.35,
temperature=0.2,
top_k=100,
top_p=0.99,
truncation=True
)
# Print the generated text
print(output[0]["generated_text"])
```
# Contact
If you have any questions, please contact me: [email protected] or [email protected] |