OpenLLM-Ro
/

RoLlama3-8b-Instruct-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

intelpen commited on Oct 16, 2024

Commit

4b6c9e8

·

verified ·

1 Parent(s): e262bca

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -6,8 +6,12 @@ tags: []
 # Model Card for Model ID
 This model is a Bits&Bytes 4 bits quantization of the https://huggingface.co/OpenLLM-Ro/RoLlama3-8b-Instruct model.
-You can run it on a GPU with 6GB of free ram. (So usually a gpu with 8 Gb VRAM).
 ## Model Details

 # Model Card for Model ID
 This model is a Bits&Bytes 4 bits quantization of the https://huggingface.co/OpenLLM-Ro/RoLlama3-8b-Instruct model.
+The main advantages of this model are :
+- it runs on a GPU with 6GB of free ram. (So usually a user-grade gpu with 8 Gb VRAM, versus the standard model which needs 48+GB).
+- it is 2-3 times faster in inference time/token
+The main drawback is that is less accurate than the full(original) model, although is up to you to decide if the compromise is a good fit for your use-case.
 ## Model Details