intelpen commited on
Commit
4b6c9e8
·
verified ·
1 Parent(s): e262bca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -6,8 +6,12 @@ tags: []
6
  # Model Card for Model ID
7
 
8
  This model is a Bits&Bytes 4 bits quantization of the https://huggingface.co/OpenLLM-Ro/RoLlama3-8b-Instruct model.
9
- You can run it on a GPU with 6GB of free ram. (So usually a gpu with 8 Gb VRAM).
10
 
 
 
 
 
 
11
 
12
 
13
  ## Model Details
 
6
  # Model Card for Model ID
7
 
8
  This model is a Bits&Bytes 4 bits quantization of the https://huggingface.co/OpenLLM-Ro/RoLlama3-8b-Instruct model.
 
9
 
10
+ The main advantages of this model are :
11
+ - it runs on a GPU with 6GB of free ram. (So usually a user-grade gpu with 8 Gb VRAM, versus the standard model which needs 48+GB).
12
+ - it is 2-3 times faster in inference time/token
13
+
14
+ The main drawback is that is less accurate than the full(original) model, although is up to you to decide if the compromise is a good fit for your use-case.
15
 
16
 
17
  ## Model Details