File size: 1,372 Bytes
8252875
deafab1
 
 
 
 
 
aa75674
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8252875
 
 
 
 
 
deafab1
8252875
deafab1
aa75674
31c8e99
aa75674
deafab1
aa75674
 
deafab1
aa75674
deafab1
 
8252875
aa75674
 
 
deafab1
 
aa75674
8252875
 
 
deafab1
aa75674
deafab1
 
8252875
deafab1
 
aa75674
8252875
deafab1
8252875
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
library_name: transformers
base_model: meta-llama/Meta-Llama-3.1-70B-Instruct
license: llama3.1
model-index:
- name: Meta-Llama-3.1-70B-Instruct-INT8
  results: []
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This is a quantized version of `Llama 3.1 70B Instruct`. Quantized to **8-bit** using `bistandbytes` and `accelerate`.

- **Developed by:** Farid Saud @ DSRS
- **License:** llama3.1
- **Base Model:** meta-llama/Meta-Llama-3.1-70B-Instruct

## Use this model


Use a pipeline as a high-level helper:
```python
# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="fsaudm/Meta-Llama-3.1-70B-Instruct-INT8")
pipe(messages)
```



Load model directly
```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-INT8")
model = AutoModelForCausalLM.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-INT8")
```

The base model information can be found in the original [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)