File size: 2,444 Bytes
5586143 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
Quantization made by Richard Erkhov.
[Github](https://github.com/RichardErkhov)
[Discord](https://discord.gg/pvy7H8DZMG)
[Request more models](https://github.com/RichardErkhov/quant_request)
mptk-1b - bnb 8bits
- Model creator: https://huggingface.co/team-lucid/
- Original model: https://huggingface.co/team-lucid/mptk-1b/
Original model description:
---
license: apache-2.0
language:
- ko
---
# MPTK-1B
MPTK-1B๋ ํ๊ตญ์ด/์์ด์ฝ๋ ๋ฐ์ดํฐ์
์์ ํ์ต๋ 1.3B ํ๋ผ๋ฏธํฐ์ decoder-only transformer ์ธ์ด๋ชจ๋ธ์
๋๋ค.
์ด ๋ชจ๋ธ์ ๊ตฌ๊ธ์ [TPU Research Cloud(TRC)](https://sites.research.google/trc/about/)๋ฅผ ํตํด ์ง์๋ฐ์ Cloud TPU๋ก ํ์ต๋์์ต๋๋ค.
## Model Details
### Model Description
๋ค๋ฅธ decoder-only transformer์์ ์ผ๋ถ ์์ ๋ ์ํคํ
์ฒ์ธ MPT๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค.
- [ALiBi (Attention with Linear Biases)](https://arxiv.org/abs/2108.12409)๋ฅผ ์ฌ์ฉํฉ๋๋ค
- bias๋ฅผ ์ฌ์ฉํ์ง ์์ต๋๋ค.
| Hyperparameter | Value |
|-----------------|-------|
| n_parameters | 1.3B |
| n_layers | 24 |
| n_heads | 16 |
| d_model | 2048 |
| vocab size | 50432 |
| sequence length | 2048 |
## Uses
## How to Get Started with the Model
fp16์ผ๋ก ์คํ ์ NaN์ด ๋ฐ์ํ ์ ์์ต๋๋ค. ๋ฐ๋ผ์ fp32 ํน์ bf16๋ก ์คํํ๊ธฐ๋ฅผ ๊ถ์ฅํฉ๋๋ค.
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
tokenizer = AutoTokenizer.from_pretrained("team-lucid/mptk-1b")
model = AutoModelForCausalLM.from_pretrained("team-lucid/mptk-1b")
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
print(
pipe(
'๋ํ๋ฏผ๊ตญ์ ์๋๋',
max_new_tokens=100,
do_sample=True,
)
)
```
## Training Details
### Training Data
[OSCAR](https://oscar-project.org/), mC4, wikipedia, namuwiki ๋ฑ ํ๊ตญ์ด
๋ฐ์ดํฐ์ [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), [The Stack](https://huggingface.co/datasets/bigcode/the-stack)
์์ ์ผ๋ถ๋ฅผ ์ถ๊ฐํด ํ์ตํ์์ต๋๋ค.
#### Training Hyperparameters
| **Hyperparameter** | **Value** |
|--------------------|------------|
| Precision | bfloat16 |
| Optimizer | Lion |
| Learning rate | 2e-4 |
| Batch size | 1024 |
|