AlberBshara commited on
Commit
2f0b80c
·
verified ·
1 Parent(s): 05ef6d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -5
README.md CHANGED
@@ -8,15 +8,105 @@ tags:
8
  - unsloth
9
  - llama
10
  - trl
11
- base_model: unsloth/llama-3-8b-bnb-4bit
12
  ---
13
 
14
- # Uploaded model
15
 
16
  - **Developed by:** AlberBshara
17
  - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
19
 
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - unsloth
9
  - llama
10
  - trl
11
+ base_model: llama-3-8b.
12
  ---
13
 
14
+ # Uploaded Model
15
 
16
  - **Developed by:** AlberBshara
17
  - **License:** apache-2.0
18
+ - **Finetuned from model:** llama-3-8b-bnb-4bit
19
 
20
+ This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
21
 
22
+ Here I fine-tuned Llama3_8B to perform the matching task in my Scholara Virtual assistant. It matches the given student information with the provided scholarships list (which comes from my Vector DB and my AI Web agent), and then shows the student the most suitable scholarships based on their information and desires.
23
+
24
+ ## Example Usage
25
+
26
+ The following example demonstrates how to use the model. It requires at least 1xL4 GPU to make the inference.
27
+
28
+
29
+ ```python
30
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
31
+ from typing import Tuple
32
+ import torch
33
+
34
+ class ScholaraMatcher:
35
+ def __init__(self, load_in_4bit: bool = True,
36
+ load_cpu_mem_usage: bool = True,
37
+ hf_model_path: str = "AlberBshara/scholara_matching",
38
+ k: int = 2):
39
+ """
40
+ Args:
41
+ load_in_4bit (bool): Use 4-bit quantization. Defaults to True.
42
+ load_cpu_mem_usage (bool): Reduce CPU memory usage. Defaults to True.
43
+ hf_model_path (str): The path of your model on HuggingFace-Hub like "your-user-name/model-name".
44
+ k (int): The number of matched scholarships. Preferably [2 <= k <= 4].
45
+ """
46
+ assert torch.cuda.is_available(), "CUDA is not available. An NVIDIA GPU is required."
47
+ assert any("L4" in torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())), \
48
+ "An NVIDIA L4 GPU is required to initialize this class."
49
+
50
+ # Specify the quantization config
51
+ self._bnb_config = BitsAndBytesConfig(load_in_4bit=load_in_4bit)
52
+
53
+ # Load model directly with quantization config
54
+ self._model = AutoModelForCausalLM.from_pretrained(
55
+ hf_model_path,
56
+ low_cpu_mem_usage=load_cpu_mem_usage,
57
+ quantization_config=self._bnb_config,
58
+ )
59
+
60
+ # Load the tokenizer
61
+ self._tokenizer = AutoTokenizer.from_pretrained(hf_model_path)
62
+ self._hf_model_path = hf_model_path
63
+ self._instruction = f"Based on the student details, select the best {k} scholarships for them only from the following given scholarships"
64
+ self._EOS_TOKEN_ID = self._tokenizer.eos_token_id
65
+
66
+ self._alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
67
+
68
+ ### Instruction:
69
+ {}
70
+
71
+ ### Input:
72
+ {}
73
+
74
+ ### Response:
75
+ {}
76
+ """
77
+
78
+ def invoke(self, student_info: str, scholarships: str) -> Tuple:
79
+ if not student_info.strip():
80
+ raise ValueError("student_info cannot be empty or None")
81
+
82
+ if not scholarships.strip():
83
+ raise ValueError("scholarships cannot be empty or None")
84
+
85
+ inputs = f"student details: \n [{student_info}]. \n scholarships list: \n {scholarships}"
86
+ inputs = self._tokenizer(
87
+ [
88
+ self._alpaca_prompt.format(
89
+ self._instruction, # instruction
90
+ inputs, # input
91
+ "", # output - leave this blank for generation.
92
+ )
93
+ ], return_tensors="pt"
94
+ ).to("cuda")
95
+
96
+ input_ids = inputs['input_ids']
97
+ attention_mask = inputs['attention_mask']
98
+
99
+ output_ids = self._model.generate(input_ids, pad_token_id=self._EOS_TOKEN_ID)
100
+
101
+ output_text = self._tokenizer.decode(output_ids[0], skip_special_tokens=True)
102
+
103
+ return output_text, output_ids, attention_mask, input_ids
104
+
105
+ def extract_answer(self, output: torch.Tensor) -> str:
106
+ """
107
+ Returns the required answer after getting rid of the instruction and inputs.
108
+ """
109
+ decoded_outputs = self._tokenizer.batch_decode(output)
110
+ response_text = decoded_outputs[0].split("### Response:")[1].strip()
111
+
112
+ return response_text