File size: 3,689 Bytes
861cf63
ba72360
 
 
 
 
 
861cf63
 
e406f33
ba72360
861cf63
ba72360
 
1befb9a
f15bb20
 
 
 
8e4e5a3
 
861cf63
8e4e5a3
a8b54ad
8e4e5a3
 
 
a8b54ad
8e4e5a3
 
3ec1139
a3410e2
3ec1139
 
db6cb1f
3ec1139
db6cb1f
3ec1139
ba72360
 
 
3ec1139
 
 
 
 
 
 
 
 
 
 
ba72360
 
3ec1139
ba72360
3ec1139
ba72360
 
 
 
3ec1139
ba72360
 
 
3ec1139
 
a8b54ad
8e4e5a3
 
 
a8b54ad
8e4e5a3
861cf63
 
5a5868d
a3410e2
5a5868d
 
b9dcf29
861cf63
 
 
ba72360
 
 
861cf63
 
 
 
 
 
 
 
826a6d1
861cf63
 
ba72360
 
861cf63
ba72360
861cf63
ba72360
 
 
 
861cf63
ba72360
 
 
861cf63
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: llama2
datasets:
- ju-resplande/rebel-pt
- paulofinardi/OIG_small_chip2_portuguese_brasil
- Guilherme34/Cabrita-lora-ptbr
- dominguesm/Canarim-Instruct-PTBR-Dataset
language:
- pt
- en
pipeline_tag: text-generation
library_name: transformers
widget:
- text: |
    Quem foi Pedro Álvares Cabral?
tags:
- Portuguese
- Llama
- Tiny-Llama
- LLM
- PEFT
---

<hr>

# PT - README

<hr>


<p align="center">
  <img width="250" alt="Samba Logo" src="https://cdn-uploads.huggingface.co/production/uploads/658c21f4c1229bf113295773/MuRvqTWpp-d0NRYQ0yRPL.png">
</p>

Samba é um LLM treinado em dados da língua portuguesa. O modelo é baseado no [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), uma versão de 1.1B parâmetros do LLaMA-2.

O projeto do LLM Samba tem como objetivo fornecer mais opções de LLMs para língua portuguesa, ao mesmo tempo que disponibiliza um modelo menos complexo para que, dessa forma, usuários com menos poder computacional possam usufruir das LLMs.

<p align="center">
  <img width="250" alt="Countries Logo" src="https://cdn-uploads.huggingface.co/production/uploads/658c21f4c1229bf113295773/d3twZrXng5eDjg_LbH4pF.png">
</p>

### Descrição do Modelo

- **Desenvolvido por:** [Leonardo Souza](https://huggingface.co/lrds-code)
- **Tipo do Modelo:** LLaMA-Based
- **Licença:** Academic Free License v3.0
- **Fine-tunado do modelo:** [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)

## Como usar

```python
import torch
from transformers import pipeline

samba = pipeline('text-generation', model='lrds-code/samba-1.1B', torch_dtype=torch.bfloat16, device_map='auto')

messages = [{"role": "system",
             "content": ""},
            {"role": "user",
             "content": "Quantos planetas existem no sistema solar?"}]

prompt = samba.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = samba(prompt, max_new_tokens=256, do_sample=False, temperature=0.1, top_k=50, top_p=0.95)
print(outputs[0]['generated_text'])
```

<hr>

# EN - README

<hr>

# Samba-1.1B

<p align="center">
  <img width="250" alt="Samba Logo" src="https://cdn-uploads.huggingface.co/production/uploads/658c21f4c1229bf113295773/MuRvqTWpp-d0NRYQ0yRPL.png">
</p>

Samba is a LLM trained on portuguese language data. The model is based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), a 1.1B parameter version of LLaMA-2.

The LLM Samba project aims to provide more LLM options in Portuguese, while also providing less complex models so that users with less computational power can take advantage of the LLMs.

<p align="center">
  <img width="250" alt="Countries Logo" src="https://cdn-uploads.huggingface.co/production/uploads/658c21f4c1229bf113295773/d3twZrXng5eDjg_LbH4pF.png">
</p>

### Model Description

- **Developed by:** [Leonardo Souza](https://huggingface.co/lrds-code)
- **Model type:** LLaMA-Based
- **License:** Academic Free License v3.0
- **Finetuned from model:** [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)

## How to Use

```python
import torch
from transformers import pipeline

samba = pipeline('text-generation', model='lrds-code/samba-1.1B', torch_dtype=torch.bfloat16, device_map='auto')

messages = [{"role": "system",
             "content": ""},
            {"role": "user",
             "content": "Quantos planetas existem no sistema solar?"}]

prompt = samba.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = samba(prompt, max_new_tokens=256, do_sample=False, temperature=0.1, top_k=50, top_p=0.95)
print(outputs[0]['generated_text'])
```