AdithyaSK
/

LLama3Tokenizer

@@ -1,172 +1,8 @@
 ---
 library_name: transformers
-tags:
-- hindi
-- bilingual
 license: llama2
 language:
-- hi
 - en
 ---
-# LLama3-Gaja-Hindi-8B-v0.1
-## Overview
-LLama3-Gaja-Hindi-8B-v0.1 is an extension of the Ambari series, a bilingual English/Hindi model developed and released by [Cognitivelab.in](https://www.cognitivelab.in/). This model is specialized for natural language understanding tasks, particularly in the context of instructional pairs. It is built upon the [Llama3 8b](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model, utilizing a fine-tuning process with a curated dataset of translated instructional pairs.
-<img src="https://cdn-uploads.huggingface.co/production/uploads/6442d975ad54813badc1ddf7/G0u9L6RQJFinST0chQmfL.jpeg" width="500px">
-## Generate
-```python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer
-model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda")
-tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True)
-# Existing messages list
-messages = [
-    {"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
-    {"role": "user", "content": "Who are you"}
-]
-input_ids = tokenizer.apply_chat_template(
-    messages,
-    add_generation_prompt=True,
-    # tokenize=False,
-    return_tensors="pt"
-).to("cuda")
-outputs = model.generate(
-    input_ids,
-    max_new_tokens=256,
-    eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
-    do_sample=True,
-    temperature=0.6,
-    top_p=0.9,
-)
-response = outputs[0][input_ids.shape[-1]:]
-print(tokenizer.decode(response, skip_special_tokens=True))
-```
-## Multi-turn Chat
-To use the Ambari-7B-Instruct-v0.1 model, you can follow the example code below:
-```python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer
-model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda")
-tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True)
-# Existing messages list
-messages = [
-    {"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
-]
-# Function to add user input and generate response
-def process_user_input(user_input):
-    global messages
-    # Add user's input to messages list
-    messages.append({"role": "user", "content": user_input})
-    # Prepare the prompt for generation
-    prompt_formatted_message = tokenizer.apply_chat_template(
-        messages,
-        add_generation_prompt=True,
-        tokenize=False
-    )
-    # Configure generation parameters
-    generation_config = GenerationConfig(
-        repetition_penalty=1.2,
-        max_new_tokens=8000,
-        temperature=0.2,
-        top_p=0.95,
-        top_k=40,
-        bos_token_id=tokenizer.bos_token_id,
-        eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
-        pad_token_id=tokenizer.pad_token_id,
-        do_sample=True,
-        use_cache=True,
-        return_dict_in_generate=True,
-        output_attentions=False,
-        output_hidden_states=False,
-        output_scores=False,
-    )
-    streamer = TextStreamer(tokenizer)
-    batch = tokenizer(str(prompt_formatted_message.strip()), return_tensors="pt")
-    print("\033[32mResponse: \033[0m")  # Print an empty response
-    # Generate response
-    generated = model.generate(
-        inputs=batch["input_ids"].to("cuda"),
-        generation_config=generation_config,
-        streamer=streamer,
-    )
-    # Extract and format assistant's response
-    # print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
-    assistant_response = tokenizer.decode(generated["sequences"].cpu().tolist()[0])
-     # Find the last occurrence of "assistant" and empty string ("")
-    assistant_start_index = assistant_response.rfind("<|start_header_id|>assistant<|end_header_id|>")
-    empty_string_index = assistant_response.rfind("<|eot_id|>")
-    # Extract the text between the last "assistant" and ""
-    if assistant_start_index != -1 and empty_string_index != -1:
-        final_response = assistant_response[assistant_start_index + len("<|start_header_id|>assistant<|end_header_id|>") : empty_string_index]
-    else:
-        # final_response = assistant_response  # If indices not found, use the whole response
-        assert "Filed to generate multi turn prompt formate"
-    # Append the extracted response to the messages list
-    messages.append({"role": "assistant", "content": final_response})
-    # messages.append({"role": "assistant", "content": assistant_response})
-    # Print assistant's response
-    # print(f"Assistant: {assistant_response}")
-# Main interaction loop
-while True:
-    print("=================================================================================")
-    user_input = input("Input: ")  # Prompt user for input
-    # Check if user_input is empty
-    if not user_input.strip():  # .strip() removes any leading or trailing whitespace
-        break  # Break out of the loop if input is empty
-      # Print response placeholder
-    process_user_input(user_input)  # Process user's input and generate response
-```
-## Prompt formate
-system prompt = `You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model(LLM), proficient in English and Hindi. You can respond in both languages based on the users request.`
-```
-<|begin_of_text|><|start_header_id|>system<|end_header_id|>
-{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
-{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
-{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>
-{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
-```
-## Benchmarks
-coming soon
-## Bilingual Instruct Fine-tuning
-The model underwent a pivotal stage of supervised fine-tuning with low-rank adaptation, focusing on bilingual instruct fine-tuning. This approach involved training the model to respond adeptly in either English or Hindi based on the language specified in the user prompt or instruction.
-## References
-- [Ambari-7B-Instruct Model](https://huggingface.co/Cognitive-Lab/Ambari-7B-Instruct-v0.1)

 ---
 library_name: transformers
 license: llama2
 language:
 - en
 ---
+# LLama3 Tokenizer