Update README.md

Browse files

Files changed (1) hide show

README.md +77 -0

README.md CHANGED Viewed

@@ -54,6 +54,83 @@ Note some portions of conversation redacted for public README purposes
 **You:** no tell me about it
 **Sophie:** 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.
 ## Model Description

 **You:** no tell me about it
 **Sophie:** 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.
+## Hardware Requirements
+The model has been tested with different precision settings to determine the minimum hardware requirements:
+| Precision | VRAM Allocated | VRAM Reserved | Suitable GPUs | Notes |
+|-----------|----------------|---------------|---------------|-------|
+| FP32 (32-bit) | 14.24 GB | 14.24 GB | RTX 3090, 4090, A5000, A6000, etc. | Default loading mode |
+| FP16 (16-bit) | 7.12 GB | 21.35 GB | RTX 3090, 4090, A5000, A6000, etc. | Recommended for most users |
+| 8-bit Quantization | 5.68 GB | 6.14 GB | RTX 2060 12GB, 3060, 3070, etc. | Good balance of quality and efficiency |
+| 4-bit Quantization | 2.27 GB | 2.30 GB | Most modern GPUs (GTX 1060+) | Lowest quality, runs on older hardware |
+### Recommended Loading Code
+**For high-end GPUs (FP16):**
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load in half precision for best balance of performance and quality
+tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
+model = AutoModelForCausalLM.from_pretrained(
+    "magicsquares137/phi3-uncensored-chat",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+```
+**For mid-range GPUs (8-bit):**
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+# 8-bit quantization config
+quantization_config = BitsAndBytesConfig(
+    load_in_8bit=True,
+    llm_int8_threshold=6.0
+)
+# Load in 8-bit
+tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
+model = AutoModelForCausalLM.from_pretrained(
+    "magicsquares137/phi3-uncensored-chat",
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+**For low-end GPUs (4-bit):**
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+# 4-bit quantization config
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16
+)
+# Load in 4-bit
+tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
+model = AutoModelForCausalLM.from_pretrained(
+    "magicsquares137/phi3-uncensored-chat",
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+**For CPU-only inference** (much slower but works on any system):
+```python
+model = AutoModelForCausalLM.from_pretrained(
+    "magicsquares137/phi3-uncensored-chat",
+    device_map="cpu"
+)
+```
+Note: Lower precision (8-bit and 4-bit) may result in slightly reduced output quality, but the difference is often minimal for most use cases.
 ## Model Description