CISCai commited on
Commit
36b22a6
·
verified ·
1 Parent(s): 37a8ef0

Updated with llama-cpp-python example

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -109,6 +109,94 @@ There is a similar option for V-cache (`-ctv`), however that is [not working yet
109
 
110
  For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  <!-- README_GGUF.md-how-to-run end -->
113
 
114
  <!-- original-model-card start -->
 
109
 
110
  For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
111
 
112
+ ## How to run from Python code
113
+
114
+ You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) module.
115
+
116
+ ### How to load this model in Python code, using llama-cpp-python
117
+
118
+ For full documentation, please see: [llama-cpp-python docs](https://llama-cpp-python.readthedocs.io/en/latest/).
119
+
120
+ #### First install the package
121
+
122
+ Run one of the following commands, according to your system:
123
+
124
+ ```shell
125
+ # Prebuilt wheel with basic CPU support
126
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
127
+ # Prebuilt wheel with NVidia CUDA acceleration
128
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 (or cu122 etc.)
129
+ # Prebuilt wheel with Metal GPU acceleration
130
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
131
+ # Build base version with no GPU acceleration
132
+ pip install llama-cpp-python
133
+ # With NVidia CUDA acceleration
134
+ CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
135
+ # Or with OpenBLAS acceleration
136
+ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
137
+ # Or with CLBLast acceleration
138
+ CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
139
+ # Or with AMD ROCm GPU acceleration (Linux only)
140
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
141
+ # Or with Metal GPU acceleration for macOS systems only
142
+ CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
143
+ # Or with Vulkan acceleration
144
+ CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
145
+ # Or with Kompute acceleration
146
+ CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
147
+ # Or with SYCL acceleration
148
+ CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
149
+
150
+ # In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
151
+ $env:CMAKE_ARGS = "-DLLAMA_CUDA=on"
152
+ pip install llama-cpp-python
153
+ ```
154
+
155
+ #### Simple llama-cpp-python example code
156
+
157
+ ```python
158
+ from llama_cpp import Llama
159
+
160
+ # Chat Completion API
161
+
162
+ llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384)
163
+ print(llm.create_chat_completion(
164
+ messages = [
165
+ {
166
+ "role": "user",
167
+ "content": "What's the weather like in Oslo?"
168
+ }
169
+ ],
170
+ tools=[{
171
+ "type": "function",
172
+ "function": {
173
+ "name": "get_current_weather",
174
+ "description": "Get the current weather in a given location",
175
+ "parameters": {
176
+ "type": "object",
177
+ "properties": {
178
+ "location": {
179
+ "type": "string",
180
+ "description": "The city and state, e.g. San Francisco, CA"
181
+ },
182
+ "unit": {
183
+ "type": "string",
184
+ "enum": [ "celsius", "fahrenheit" ]
185
+ }
186
+ },
187
+ "required": [ "location" ]
188
+ }
189
+ }
190
+ }],
191
+ tool_choice=[{
192
+ "type": "function",
193
+ "function": {
194
+ "name": "get_current_weather"
195
+ }
196
+ }]
197
+ ))
198
+ ```
199
+
200
  <!-- README_GGUF.md-how-to-run end -->
201
 
202
  <!-- original-model-card start -->