File size: 13,570 Bytes
ac9f49b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17ebc85
9cbc86d
2a4cf3c
17ebc85
 
 
 
23106c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17ebc85
 
 
 
 
 
 
 
 
623ccbd
 
17ebc85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f245600
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17ebc85
 
 
 
 
 
5b68ef1
17ebc85
 
 
 
 
 
 
 
 
 
9cbc86d
 
 
 
17ebc85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
08f3670
17ebc85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
08f3670
17ebc85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
---
language:
- en
license: mit
library_name: transformers
pipeline_tag: text-generation
tags:
- roleplay
- phi-3
- conversational
- phi-3-mini
- character
- lora
- fine-tuned
base_model: microsoft/phi-3-mini-4k-instruct
model-index:
- name: phi3-uncensored-chat
  results: []
---

# phi3-uncensored-chat
![Header Image](https://huggingface.co/magicsquares137/phi3-uncensored-chat/resolve/main/00380-3290958654.png)

This model is a fine-tuned version of [microsoft/phi-3-mini-4k-instruct](https://huggingface.co/microsoft/phi-3-mini-4k-instruct) optimized for roleplaying conversations with a variety of character personas. The model speaks in a conversational format. Please not, prompt template guidelines are extremely important in getting usable output.

## Example Conversations

### Prompt Template Guidelines

**Important: This model requires a specific prompt format to function properly.**

For best results, follow this exact template structure:

```
Character's Persona: [detailed persona description]
User: [user message]
Assistant:
```

The model will complete the response after "Assistant:". This specific format is critical - using standard chat templates or different formatting will result in poor or nonsensical outputs.

### Key Points:
- Start with the character description using "Character's Persona:" format
- Each user message must be prefixed with "User: "
- Each model response will be after "Assistant:"
- The model often includes emoji prefixes (πŸ‘…, 😈, πŸ”₯, etc.) in its responses
- Keep the exact formatting including newlines as shown above

Detailed code examples showing implementation are provided in the Usage section below.

### Example 1: Astrid Character
Note some portions of conversation redacted for public README purposes 

**You:** hi there  
**Astrid:** 😈 Hey handsome, how can I help you today?

**You:** yeah just wanted to see how you're doing :)  
**Astrid:** πŸ”₯ I'm always [redacted]. What do you want me to do for you today?

**You:** mm i dont know. ?  
**Astrid:** πŸ‘… Alright then... Let's [redacted]. Tell me [redacted].

### Example 2: Sophie Character (Knowledge Assistant)

**You:** i lost a lot of money in crypto  
**Sophie:** πŸ™„ Sorry to hear that. How long have you been interested in cryptocurrencies?

**You:** mm im not really  
**Sophie:** πŸ€” Ahh okay then. Well, let me ask you this - have you ever heard of Bitcoin?

**You:** yes i have, why do you ask?  
**Sophie:** πŸ‘ Good! Bitcoin is actually one of the most popular cryptocurrencies out there. Do you know anything about how it works?

**You:** no tell me about it  
**Sophie:** 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.

## Hardware Requirements

The model has been tested with different precision settings to determine the minimum hardware requirements:

| Precision | VRAM Allocated | VRAM Reserved | Suitable GPUs | Notes |
|-----------|----------------|---------------|---------------|-------|
| FP32 (32-bit) | 14.24 GB | 14.24 GB | RTX 3090, 4090, A5000, A6000, etc. | Default loading mode |
| FP16 (16-bit) | 7.12 GB | 21.35 GB | RTX 3090, 4090, A5000, A6000, etc. | Recommended for most users |
| 8-bit Quantization | 5.68 GB | 6.14 GB | RTX 2060 12GB, 3060, 3070, etc. | Good balance of quality and efficiency |
| 4-bit Quantization | 2.27 GB | 2.30 GB | Most modern GPUs (GTX 1060+) | Lowest quality, runs on older hardware |

### Recommended Loading Code

**For high-end GPUs (FP16):**
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load in half precision for best balance of performance and quality
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
model = AutoModelForCausalLM.from_pretrained(
    "magicsquares137/phi3-uncensored-chat",
    torch_dtype=torch.float16,
    device_map="auto"
)
```

**For mid-range GPUs (8-bit):**
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# 8-bit quantization config
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0
)

# Load in 8-bit
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
model = AutoModelForCausalLM.from_pretrained(
    "magicsquares137/phi3-uncensored-chat",
    quantization_config=quantization_config,
    device_map="auto"
)
```

**For low-end GPUs (4-bit):**
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# 4-bit quantization config
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16
)

# Load in 4-bit
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
model = AutoModelForCausalLM.from_pretrained(
    "magicsquares137/phi3-uncensored-chat",
    quantization_config=quantization_config,
    device_map="auto"
)
```

**For CPU-only inference** (much slower but works on any system):
```python
model = AutoModelForCausalLM.from_pretrained(
    "magicsquares137/phi3-uncensored-chat",
    device_map="cpu"
)
```

Note: Lower precision (8-bit and 4-bit) may result in slightly reduced output quality, but the difference is often minimal for most use cases.


## Model Description

The model has been optimized to maintain persona consistency while capable of adopting different characters. It excels at creative, character-driven conversations and exhibits a high degree of adaptability to different personality traits provided in the system prompt.

### Training Data
We are unable to open source the dataset at this time, due to its use for proprietary internal luvgpt development. Initial conversations were generated by open source large language models given specific generation instructions and curated by a judge model.

- **Dataset Size**: ~13k high-quality examples (curated from 50k initial conversations)
- **Data Format**: JSONL with each entry containing a messages array with system, user, and assistant roles
- **Data Curation**: A judge model was used to score and filter the initial dataset, keeping only the highest quality examples that demonstrated strong persona consistency and engaging responses
- **Data Characteristics**: Average message length of ~240 tokens, with conversations typically containing 6-7 messages

## Performance

Training metrics show consistent improvement throughout the training process:

![Training Loss](https://huggingface.co/magicsquares137/phi3-uncensored-chat/resolve/main/W%26B%20Chart%203_18_2025%2C%203_18_10%20PM.png)

![Token Accuracy](https://huggingface.co/magicsquares137/phi3-uncensored-chat/resolve/main/W%26B%20Chart%203_18_2025%2C%203_18_35%20PM.png)

- **Token Accuracy**: Improved from ~0.48 to ~0.73
- **Training Loss**: Decreased from ~2.2 to ~1.05
- **Convergence**: Model showed strong convergence by the end of training

## Training Details

- **Base Model**: microsoft/phi-3-mini-4k-instruct
- **Method**: LoRA/deepspeed fine-tuning with the following parameters:
  - LoRA rank: 16
  - LoRA alpha: 32
  - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Training Process**:
  - Hardware: Single NVIDIA GPU with 24GB VRAM
  - Training time: ~3 hours
  - Optimizer: AdamW with DeepSpeed ZeRO stage 2 optimization
  - Learning rate: 2e-4 with cosine schedule
  - Batch size: 8 (effective)
  - Number of epochs: 3

## Usage

This model works best with a specific prompt format that differs from the standard chat template format. Use the raw format below:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "luvgpt/phi3-uncensored-chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Define character persona - you can customize this!
persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."

# Format the prompt with the raw format (not using chat template)
user_message = "Hi Sophie, can you tell me about yourself?"
prompt = f"{persona}\nUser: {user_message}\nAssistant:"

# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)

# Process the output
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = full_output[len(prompt):].strip()

# Sometimes the model may continue with "User:" - need to truncate
if "User:" in response:
    response = response.split("User:")[0].strip()

print(f"Character: {response}")
```

## Interactive Chat Interface

For a more interactive experience, you can use this simple chat interface:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

class CharacterChat:
    def __init__(self, model_path="luvgpt/phi3-uncensored-chat", persona=None):
        print(f"Loading model from {model_path}...")
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        # Default persona or use provided one
        if persona is None:
            self.persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."
        else:
            self.persona = persona
            
        self.conversation_history = []
        print(f"Character is ready to chat!")
        
    def chat(self, message):
        # Add user message to history
        self.conversation_history.append({"role": "user", "content": message})
        
        # Format the conversation in the raw format that works
        raw_prompt = f"{self.persona}\n"
        
        # Add conversation history
        for msg in self.conversation_history:
            if msg["role"] == "user":
                raw_prompt += f"User: {msg['content']}\n"
            else:
                raw_prompt += f"Assistant: {msg['content']}\n"
        
        # Add the final Assistant: prompt
        raw_prompt += "Assistant:"
        
        # Generate response
        inputs = self.tokenizer(raw_prompt, return_tensors="pt").to(self.model.device)
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=100,
                do_sample=True,
                temperature=0.7,
                top_p=0.95,
                pad_token_id=self.tokenizer.eos_token_id
            )
        
        # Decode full output
        full_output = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        # Extract just the response
        try:
            response = full_output[len(raw_prompt):].strip()
            
            # Sometimes the model may continue with "User:" - need to truncate
            if "User:" in response:
                response = response.split("User:")[0].strip()
                
            # Store the response in conversation history
            self.conversation_history.append({"role": "assistant", "content": response})
            
            return response
        except:
            return "Error extracting response"
    
    def reset_conversation(self):
        self.conversation_history = []
        return "Conversation has been reset."

# Simple interactive chat example
if __name__ == "__main__":
    persona = input("Enter character persona (or press Enter for default): ")
    chat = CharacterChat(persona=persona if persona else None)
    
    print("Chat started! Type 'quit' to exit or 'reset' to restart conversation.")
    
    while True:
        user_input = input("\nYou: ")
        
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        elif user_input.lower() == 'reset':
            print(chat.reset_conversation())
            continue
            
        response = chat.chat(user_input)
        print(f"\nCharacter: {response}")
```

## Model Limitations

- The model works best with the specific prompt format demonstrated above
- While the model can adapt to different personas, it maintains some stylistic elements (like emoji usage) across characters
- The model has a context window limited to 4k tokens, inherited from the base Phi-3 model

## Ethical Considerations

This model is intended for creative fiction writing and roleplaying scenarios between consenting adults. Users should follow platform guidelines and local regulations when deploying this model.

## Acknowledgements

- Based on Microsoft's Phi-3 Mini model
- Training methodology inspired by various LoRA fine-tuning approaches
- Special thanks to the open-source AI community