Update README.md
Browse files
README.md
CHANGED
@@ -10,12 +10,42 @@ language:
|
|
10 |
- en
|
11 |
---
|
12 |
|
13 |
-
#
|
14 |
|
15 |
-
|
16 |
-
-
|
17 |
-
- **Finetuned from model :** unsloth/Llama-3.2-11B-Vision-Instruct
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
- en
|
11 |
---
|
12 |
|
13 |
+
# **unsloth/Llama-3.2-11B-Vision-Instruct (Fine-Tuned)**
|
14 |
|
15 |
+
## **Model Overview**
|
16 |
+
This model, fine-tuned from the `unsloth/Llama-3.2-11B-Vision-Instruct` base, is optimized for vision-language tasks with enhanced instruction-following capabilities. Fine-tuning was completed 2x faster using the [Unsloth](https://github.com/unslothai/unsloth) framework combined with Hugging Face's TRL library, ensuring efficient training while maintaining high performance.
|
|
|
17 |
|
18 |
+
## **Key Information**
|
19 |
+
- **Developed by:** Daemontatox
|
20 |
+
- **Base Model:** `unsloth/Llama-3.2-11B-Vision-Instruct`
|
21 |
+
- **License:** Apache-2.0
|
22 |
+
- **Language:** English (`en`)
|
23 |
+
- **Frameworks Used:** Hugging Face Transformers, Unsloth, and TRL
|
24 |
|
25 |
+
## **Performance and Use Cases**
|
26 |
+
This model is ideal for applications involving:
|
27 |
+
- Vision-based text generation and description tasks
|
28 |
+
- Instruction-following in multimodal contexts
|
29 |
+
- General-purpose text generation with enhanced reasoning
|
30 |
+
|
31 |
+
### **Features**
|
32 |
+
- **2x Faster Training:** Leveraging the Unsloth framework for accelerated fine-tuning.
|
33 |
+
- **Multimodal Capabilities:** Enhanced to handle vision-language interactions.
|
34 |
+
- **Instruction Optimization:** Tailored for improved comprehension and execution of instructions.
|
35 |
+
|
36 |
+
|
37 |
+
## **How to Use**
|
38 |
+
|
39 |
+
### **Inference Example (Hugging Face Transformers)**
|
40 |
+
|
41 |
+
```python
|
42 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
43 |
+
|
44 |
+
tokenizer = AutoTokenizer.from_pretrained("Daemontatox/finetuned-llama-3.2-vision-instruct")
|
45 |
+
model = AutoModelForCausalLM.from_pretrained("Daemontatox/finetuned-llama-3.2-vision-instruct")
|
46 |
+
|
47 |
+
input_text = "Describe the image showing a sunset over mountains."
|
48 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
49 |
+
outputs = model.generate(**inputs, max_length=100)
|
50 |
+
|
51 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|