Mr-Vicky-01 commited on
Commit
60dd74b
·
verified ·
1 Parent(s): b07b9a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -6
README.md CHANGED
@@ -12,12 +12,78 @@ language:
12
  - en
13
  ---
14
 
15
- # Uploaded model
16
 
17
- - **Developed by:** Mr-Vicky-01
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/qwen2.5-0.5b-instruct-bnb-4bit
 
20
 
21
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - en
13
  ---
14
 
15
+ ## INFERENCE
16
 
17
+ ```python
18
+ # Load model directly
19
+ from transformers import AutoModelForCausalLM, AutoTokenizer
20
+ import torch
21
 
22
+ tokenizer = AutoTokenizer.from_pretrained("Mr-Vicky-01/qwen-conversational-finetuned")
23
+ model = AutoModelForCausalLM.from_pretrained("Mr-Vicky-01/qwen-conversational-finetuned")
24
 
25
+ prompt = """
26
+ <|im_start|>system\nYou are a helpful AI assistant named Securitron<|im_end|>
27
+ """
28
+
29
+ # Keep a list for the last one conversation exchanges
30
+ conversation_history = []
31
+
32
+ while True:
33
+ user_prompt = input("User Question: ")
34
+ if user_prompt.lower() == 'break':
35
+ break
36
+
37
+ # Format the user's input
38
+ user = f"""<|im_start|>user
39
+ {user_prompt}<|im_end|>
40
+ <|im_start|>assistant"""
41
+
42
+ # Add the user's question to the conversation history
43
+ conversation_history.append(user)
44
+
45
+ # Ensure conversation starts with a user's input and keep only the last 2 exchanges (4 turns)
46
+ conversation_history = conversation_history[-5:]
47
+
48
+ # Build the full prompt
49
+ current_prompt = prompt + "\n".join(conversation_history)
50
+
51
+ # Tokenize the prompt
52
+ encodeds = tokenizer(current_prompt, return_tensors="pt", truncation=True).input_ids
53
+
54
+ # Move model and inputs to the appropriate device
55
+ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
56
+ model.to(device)
57
+ inputs = encodeds.to(device)
58
+
59
+ # Create an empty list to store generated tokens
60
+ generated_ids = inputs
61
+
62
+ # Start generating tokens one by one
63
+ assistant_response = ""
64
+ for _ in range(512): # Specify a max token limit for streaming
65
+ next_token = model.generate(
66
+ generated_ids,
67
+ max_new_tokens=1,
68
+ pad_token_id=151644,
69
+ eos_token_id=151645,
70
+ num_return_sequences=1,
71
+ do_sample=True,
72
+ top_k=50,
73
+ temperature=0.2,
74
+ top_p=0.90
75
+ )
76
+
77
+ generated_ids = torch.cat([generated_ids, next_token[:, -1:]], dim=1)
78
+ token_id = next_token[0, -1].item()
79
+ token = tokenizer.decode([token_id], skip_special_tokens=True)
80
+
81
+ assistant_response += token
82
+ print(token, end="", flush=True)
83
+
84
+ if token_id == 151645: # EOS token
85
+ break
86
+
87
+ print()
88
+ conversation_history.append(f"{assistant_response.strip()}<|im_end|>")
89
+ ```