AdithyaSK commited on
Commit
72e1063
·
verified ·
1 Parent(s): a0f171e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -165
README.md CHANGED
@@ -1,172 +1,8 @@
1
  ---
2
  library_name: transformers
3
- tags:
4
- - hindi
5
- - bilingual
6
  license: llama2
7
  language:
8
- - hi
9
  - en
10
  ---
11
 
12
- # LLama3-Gaja-Hindi-8B-v0.1
13
-
14
- ## Overview
15
-
16
- LLama3-Gaja-Hindi-8B-v0.1 is an extension of the Ambari series, a bilingual English/Hindi model developed and released by [Cognitivelab.in](https://www.cognitivelab.in/). This model is specialized for natural language understanding tasks, particularly in the context of instructional pairs. It is built upon the [Llama3 8b](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model, utilizing a fine-tuning process with a curated dataset of translated instructional pairs.
17
-
18
- <img src="https://cdn-uploads.huggingface.co/production/uploads/6442d975ad54813badc1ddf7/G0u9L6RQJFinST0chQmfL.jpeg" width="500px">
19
-
20
- ## Generate
21
- ```python
22
- import torch
23
- from transformers import AutoModelForCausalLM, AutoTokenizer
24
- from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer
25
-
26
- model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda")
27
- tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True)
28
-
29
- # Existing messages list
30
- messages = [
31
- {"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
32
- {"role": "user", "content": "Who are you"}
33
- ]
34
-
35
- input_ids = tokenizer.apply_chat_template(
36
- messages,
37
- add_generation_prompt=True,
38
- # tokenize=False,
39
- return_tensors="pt"
40
- ).to("cuda")
41
-
42
- outputs = model.generate(
43
- input_ids,
44
- max_new_tokens=256,
45
- eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
46
- do_sample=True,
47
- temperature=0.6,
48
- top_p=0.9,
49
- )
50
- response = outputs[0][input_ids.shape[-1]:]
51
- print(tokenizer.decode(response, skip_special_tokens=True))
52
- ```
53
-
54
-
55
- ## Multi-turn Chat
56
-
57
- To use the Ambari-7B-Instruct-v0.1 model, you can follow the example code below:
58
-
59
- ```python
60
- import torch
61
- from transformers import AutoModelForCausalLM, AutoTokenizer
62
- from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer
63
-
64
- model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda")
65
- tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True)
66
-
67
- # Existing messages list
68
- messages = [
69
- {"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
70
- ]
71
-
72
- # Function to add user input and generate response
73
- def process_user_input(user_input):
74
- global messages
75
- # Add user's input to messages list
76
- messages.append({"role": "user", "content": user_input})
77
-
78
- # Prepare the prompt for generation
79
- prompt_formatted_message = tokenizer.apply_chat_template(
80
- messages,
81
- add_generation_prompt=True,
82
- tokenize=False
83
- )
84
-
85
- # Configure generation parameters
86
- generation_config = GenerationConfig(
87
- repetition_penalty=1.2,
88
- max_new_tokens=8000,
89
- temperature=0.2,
90
- top_p=0.95,
91
- top_k=40,
92
- bos_token_id=tokenizer.bos_token_id,
93
- eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
94
- pad_token_id=tokenizer.pad_token_id,
95
- do_sample=True,
96
- use_cache=True,
97
- return_dict_in_generate=True,
98
- output_attentions=False,
99
- output_hidden_states=False,
100
- output_scores=False,
101
- )
102
-
103
- streamer = TextStreamer(tokenizer)
104
- batch = tokenizer(str(prompt_formatted_message.strip()), return_tensors="pt")
105
- print("\033[32mResponse: \033[0m") # Print an empty response
106
- # Generate response
107
- generated = model.generate(
108
- inputs=batch["input_ids"].to("cuda"),
109
- generation_config=generation_config,
110
- streamer=streamer,
111
-
112
- )
113
-
114
- # Extract and format assistant's response
115
- # print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
116
- assistant_response = tokenizer.decode(generated["sequences"].cpu().tolist()[0])
117
- # Find the last occurrence of "assistant" and empty string ("")
118
- assistant_start_index = assistant_response.rfind("<|start_header_id|>assistant<|end_header_id|>")
119
- empty_string_index = assistant_response.rfind("<|eot_id|>")
120
-
121
- # Extract the text between the last "assistant" and ""
122
- if assistant_start_index != -1 and empty_string_index != -1:
123
- final_response = assistant_response[assistant_start_index + len("<|start_header_id|>assistant<|end_header_id|>") : empty_string_index]
124
- else:
125
- # final_response = assistant_response # If indices not found, use the whole response
126
- assert "Filed to generate multi turn prompt formate"
127
-
128
- # Append the extracted response to the messages list
129
- messages.append({"role": "assistant", "content": final_response})
130
- # messages.append({"role": "assistant", "content": assistant_response})
131
-
132
- # Print assistant's response
133
- # print(f"Assistant: {assistant_response}")
134
-
135
- # Main interaction loop
136
- while True:
137
- print("=================================================================================")
138
- user_input = input("Input: ") # Prompt user for input
139
-
140
- # Check if user_input is empty
141
- if not user_input.strip(): # .strip() removes any leading or trailing whitespace
142
- break # Break out of the loop if input is empty
143
- # Print response placeholder
144
- process_user_input(user_input) # Process user's input and generate response
145
-
146
- ```
147
-
148
- ## Prompt formate
149
-
150
- system prompt = `You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model(LLM), proficient in English and Hindi. You can respond in both languages based on the users request.`
151
- ```
152
- <|begin_of_text|><|start_header_id|>system<|end_header_id|>
153
-
154
- {{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
155
-
156
- {{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
157
-
158
- {{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>
159
-
160
- {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
161
- ```
162
-
163
- ## Benchmarks
164
- coming soon
165
-
166
- ## Bilingual Instruct Fine-tuning
167
-
168
- The model underwent a pivotal stage of supervised fine-tuning with low-rank adaptation, focusing on bilingual instruct fine-tuning. This approach involved training the model to respond adeptly in either English or Hindi based on the language specified in the user prompt or instruction.
169
-
170
- ## References
171
-
172
- - [Ambari-7B-Instruct Model](https://huggingface.co/Cognitive-Lab/Ambari-7B-Instruct-v0.1)
 
1
  ---
2
  library_name: transformers
 
 
 
3
  license: llama2
4
  language:
 
5
  - en
6
  ---
7
 
8
+ # LLama3 Tokenizer