axyzdong
/

AMchat-GGUF

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

axyzdong commited on Aug 16, 2024

Commit

bf42b35

1 Parent(s): 9044bf3

Update README

Browse files

Files changed (1) hide show

README.md +77 -4

README.md CHANGED Viewed

@@ -12,19 +12,32 @@ license: apache-2.0
   </div>
 </div>
-## AMchat
 AM (Advanced Mathematics) Chat is a large-scale language model that integrates mathematical knowledge, advanced mathematics problems, and their solutions. This model utilizes a dataset that combines Math and advanced mathematics problems with their analyses. It is based on the InternLM2-Math-7B model and has been fine-tuned with xtuner, specifically designed to solve advanced mathematics problems.
 ## Latest Release
 - **F16 Quantization**: Achieves a balanced trade-off between model size and performance. Ideal for applications requiring precision with reduced resource consumption.
 - **Q8_0 Quantization**: Offers a substantial reduction in model size while maintaining high accuracy, making it suitable for environments with stringent memory constraints.
 - **Q4_K_M Quantization**: Provides the most compact model size with minimal impact on performance, perfect for deployment in resource-constrained settings.
-## Getting Started
-To get started with AMchat, follow these steps:
 1. **Clone the Repository**
    ```bash
@@ -37,7 +50,6 @@ To get started with AMchat, follow these steps:
    ```bash
    ollama create AMchat -f Modelfile
    ```
 3. **Run**
@@ -45,6 +57,67 @@ To get started with AMchat, follow these steps:
    ollama run AMchat
    ```
 ## Star Us
 If you find AMchat useful, please ⭐ Star this repository and help others discover it!

   </div>
 </div>
+## AMchat GGUF Model
 AM (Advanced Mathematics) Chat is a large-scale language model that integrates mathematical knowledge, advanced mathematics problems, and their solutions. This model utilizes a dataset that combines Math and advanced mathematics problems with their analyses. It is based on the InternLM2-Math-7B model and has been fine-tuned with xtuner, specifically designed to solve advanced mathematics problems.
 ## Latest Release
+2024-08-16
+- **Q6_K**
+- **Q5_K_M**
+- **Q5_0**
+- **Q4_0**
+- **Q3_K_M**
+- **Q2_K**
+2024-08-09
 - **F16 Quantization**: Achieves a balanced trade-off between model size and performance. Ideal for applications requiring precision with reduced resource consumption.
 - **Q8_0 Quantization**: Offers a substantial reduction in model size while maintaining high accuracy, making it suitable for environments with stringent memory constraints.
 - **Q4_K_M Quantization**: Provides the most compact model size with minimal impact on performance, perfect for deployment in resource-constrained settings.
+## Getting Started - Ollama
+To get started with AMchat in [Ollama](https://github.com/ollama/ollama), follow these steps:
 1. **Clone the Repository**
    ```bash
    ```bash
    ollama create AMchat -f Modelfile
    ```
 3. **Run**
    ollama run AMchat
    ```
+## Getting Started - llama-cli
+You can use `llama-cli` for conducting inference. For a detailed explanation of `llama-cli`, please refer to [this guide](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
+### Installation
+We recommend building `llama.cpp` from source. The following code snippet provides an example for the Linux CUDA platform. For instructions on other platforms, please refer to the [official guide](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build).
+- Step 1: create a conda environment and install cmake
+```shell
+conda create --name AMchat python=3.10 -y
+conda activate AMchat
+pip install cmake
+```
+- Step 2: clone the source code and build the project
+```shell
+git clone --depth=1 https://github.com/ggerganov/llama.cpp.git
+cd llama.cpp
+cmake -B build -DGGML_CUDA=ON
+cmake --build build --config Release -j
+```
+All the built targets can be found in the sub directory `build/bin`
+In the following sections, we assume that the working directory is at the root directory of `llama.cpp`.
+### Download models
+You can download the appropriate model based on your requirements.
+For instance, `AMchat-q8_0.gguf` can be downloaded as below：
+```shell
+pip install huggingface-hub
+huggingface-cli download axyzdong/AMchat-GGUF AMchat-q8_0.gguf --local-dir . --local-dir-use-symlinks False
+```
+### chat example
+```shell
+build/bin/llama-cli \
+    --model AMchat-fp16.gguf  \
+    --predict 512 \
+    --ctx-size 4096 \
+    --gpu-layers 24 \
+    --temp 0.8 \
+    --top-p 0.8 \
+    --top-k 50 \
+    --seed 1024 \
+    --color \
+    --prompt "<|im_start|>system\nYou are an expert in advanced math and you can answer all kinds of advanced math problems.<|im_end|>\n" \
+    --interactive \
+    --multiline-input \
+    --conversation \
+    --verbose \
+    --logdir workdir/logdir \
+    --in-prefix "<|im_start|>user\n" \
+    --in-suffix "<|im_end|>\n<|im_start|>assistant\n"
+```
 ## Star Us
 If you find AMchat useful, please ⭐ Star this repository and help others discover it!