Update README.md
Browse files
README.md
CHANGED
@@ -54,6 +54,83 @@ Note some portions of conversation redacted for public README purposes
|
|
54 |
**You:** no tell me about it
|
55 |
**Sophie:** 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
## Model Description
|
59 |
|
|
|
54 |
**You:** no tell me about it
|
55 |
**Sophie:** 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.
|
56 |
|
57 |
+
## Hardware Requirements
|
58 |
+
|
59 |
+
The model has been tested with different precision settings to determine the minimum hardware requirements:
|
60 |
+
|
61 |
+
| Precision | VRAM Allocated | VRAM Reserved | Suitable GPUs | Notes |
|
62 |
+
|-----------|----------------|---------------|---------------|-------|
|
63 |
+
| FP32 (32-bit) | 14.24 GB | 14.24 GB | RTX 3090, 4090, A5000, A6000, etc. | Default loading mode |
|
64 |
+
| FP16 (16-bit) | 7.12 GB | 21.35 GB | RTX 3090, 4090, A5000, A6000, etc. | Recommended for most users |
|
65 |
+
| 8-bit Quantization | 5.68 GB | 6.14 GB | RTX 2060 12GB, 3060, 3070, etc. | Good balance of quality and efficiency |
|
66 |
+
| 4-bit Quantization | 2.27 GB | 2.30 GB | Most modern GPUs (GTX 1060+) | Lowest quality, runs on older hardware |
|
67 |
+
|
68 |
+
### Recommended Loading Code
|
69 |
+
|
70 |
+
**For high-end GPUs (FP16):**
|
71 |
+
```python
|
72 |
+
import torch
|
73 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
74 |
+
|
75 |
+
# Load in half precision for best balance of performance and quality
|
76 |
+
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
|
77 |
+
model = AutoModelForCausalLM.from_pretrained(
|
78 |
+
"magicsquares137/phi3-uncensored-chat",
|
79 |
+
torch_dtype=torch.float16,
|
80 |
+
device_map="auto"
|
81 |
+
)
|
82 |
+
```
|
83 |
+
|
84 |
+
**For mid-range GPUs (8-bit):**
|
85 |
+
```python
|
86 |
+
import torch
|
87 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
88 |
+
|
89 |
+
# 8-bit quantization config
|
90 |
+
quantization_config = BitsAndBytesConfig(
|
91 |
+
load_in_8bit=True,
|
92 |
+
llm_int8_threshold=6.0
|
93 |
+
)
|
94 |
+
|
95 |
+
# Load in 8-bit
|
96 |
+
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
|
97 |
+
model = AutoModelForCausalLM.from_pretrained(
|
98 |
+
"magicsquares137/phi3-uncensored-chat",
|
99 |
+
quantization_config=quantization_config,
|
100 |
+
device_map="auto"
|
101 |
+
)
|
102 |
+
```
|
103 |
+
|
104 |
+
**For low-end GPUs (4-bit):**
|
105 |
+
```python
|
106 |
+
import torch
|
107 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
108 |
+
|
109 |
+
# 4-bit quantization config
|
110 |
+
quantization_config = BitsAndBytesConfig(
|
111 |
+
load_in_4bit=True,
|
112 |
+
bnb_4bit_compute_dtype=torch.float16
|
113 |
+
)
|
114 |
+
|
115 |
+
# Load in 4-bit
|
116 |
+
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
|
117 |
+
model = AutoModelForCausalLM.from_pretrained(
|
118 |
+
"magicsquares137/phi3-uncensored-chat",
|
119 |
+
quantization_config=quantization_config,
|
120 |
+
device_map="auto"
|
121 |
+
)
|
122 |
+
```
|
123 |
+
|
124 |
+
**For CPU-only inference** (much slower but works on any system):
|
125 |
+
```python
|
126 |
+
model = AutoModelForCausalLM.from_pretrained(
|
127 |
+
"magicsquares137/phi3-uncensored-chat",
|
128 |
+
device_map="cpu"
|
129 |
+
)
|
130 |
+
```
|
131 |
+
|
132 |
+
Note: Lower precision (8-bit and 4-bit) may result in slightly reduced output quality, but the difference is often minimal for most use cases.
|
133 |
+
|
134 |
|
135 |
## Model Description
|
136 |
|