Update README.md
Browse files
README.md
CHANGED
@@ -6,8 +6,12 @@ tags: []
|
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
This model is a Bits&Bytes 4 bits quantization of the https://huggingface.co/OpenLLM-Ro/RoLlama3-8b-Instruct model.
|
9 |
-
You can run it on a GPU with 6GB of free ram. (So usually a gpu with 8 Gb VRAM).
|
10 |
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
|
13 |
## Model Details
|
|
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
This model is a Bits&Bytes 4 bits quantization of the https://huggingface.co/OpenLLM-Ro/RoLlama3-8b-Instruct model.
|
|
|
9 |
|
10 |
+
The main advantages of this model are :
|
11 |
+
- it runs on a GPU with 6GB of free ram. (So usually a user-grade gpu with 8 Gb VRAM, versus the standard model which needs 48+GB).
|
12 |
+
- it is 2-3 times faster in inference time/token
|
13 |
+
|
14 |
+
The main drawback is that is less accurate than the full(original) model, although is up to you to decide if the compromise is a good fit for your use-case.
|
15 |
|
16 |
|
17 |
## Model Details
|