File size: 3,309 Bytes
861cf63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b97ddcc
 
 
2f7169d
5a5868d
 
 
 
 
 
b9dcf29
861cf63
 
 
 
 
 
 
3ff97aa
861cf63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
826a6d1
861cf63
 
 
 
333f34f
 
861cf63
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: afl-3.0
datasets:
- ju-resplande/rebel-pt
- paulofinardi/OIG_small_chip2_portuguese_brasil
- Guilherme34/Cabrita-lora-ptbr
- dominguesm/Canarim-Instruct-PTBR-Dataset
language:
- en
- pt
pipeline_tag: text-generation
library_name: transformers
widget:
  - text: >
      Pergunta: Quantos planetas existem no sistema solar?
---
# Samba-1.1B

<p align="center">
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658c21f4c1229bf113295773/J7yD7tR6y1oEH2RRxDyMT.png)
</p>

![image/png](https://cdn-uploads.huggingface.co/production/uploads/658c21f4c1229bf113295773/J7yD7tR6y1oEH2RRxDyMT.png)

<p align="center">
  <img width="250" alt="Samba Logo" src="https://cdn-uploads.huggingface.co/production/uploads/658c21f4c1229bf113295773/J7yD7tR6y1oEH2RRxDyMT.png">
</p>

Samba is a LLM trained on portuguese language data. The model is based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), a 1.1B parameter version of LLaMA-2.

The LLM Samba project aims to provide more LLM options in Portuguese, while also providing less complex models so that users with less computational power can take advantage of the LLMs.

In support of portuguese-speaking countries. 🇦🇴🇧🇷🇨🇻🇬🇼🇬🇶🇲🇿🇵🇹🇸🇹🇹🇱

## Model Details

This model was fine-tuned on two datasets ([rebel-pt](https://huggingface.co/datasets/ju-resplande/rebel-pt), [OIG_small_chip2_portuguese_brasil](https://huggingface.co/datasets/paulofinardi/OIG_small_chip2_portuguese_brasil), [Cabrita-lora-ptbr](https://huggingface.co/datasets/Guilherme34/Cabrita-lora-ptbr) and [Canarim-Instruct-PTBR-Dataset](https://huggingface.co/datasets/dominguesm/Canarim-Instruct-PTBR-Dataset)) with portuguese data that total approximately 1.4 million samples.

## Limitations

Keep in mind the limitations of this model. It is a model with 1.1B of trained parameters and may present some glitches or hallucinations.

## Future Updates

- Add more data from the Portuguese language.
- Make quantized versions available.

### Model Description

- **Developed by:** [Leonardo Souza](https://huggingface.co/lrds-code)
- **Model type:** LLaMA-Based
- **License:** Academic Free License v3.0
- **Finetuned from model:** [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('lrds-code/samba-1.1B')
tokenizer = AutoTokenizer.from_pretrained('lrds-code/samba-1.1B')

text = 'Pergunta: Como desenvolver habilidades de programação em python?'
inputs = tokenizer(text, return_tensors='pt')

outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

## Pergunta: Como desenvolver habilidades de programação em python?
## Resposta: Para desenvolver habilidades de programação em Python, você precisa aprender a ler e escrever código.
##           Você também precisa entender o que significa cada parte do código e como ela funciona.
##           Você também precisa entender como usar bibliotecas e frameworks para criar aplicativos.
##           Além disso, você precisa entender como usar o IDE (Integrated Development Environment) para desenvolver e testar seu código.

```