Commit
·
9c5cef9
1
Parent(s):
8aa13d0
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,34 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
# THUDM's chatglm2 6B GGML
|
| 6 |
+
|
| 7 |
+
These files are GGML format model files for [THUDM's chatglm2 6B](https://huggingface.co/THUDM/chatglm2-6b).
|
| 8 |
+
|
| 9 |
+
GGML files are for CPU + GPU inference using [chatglm.cpp](https://github.com/li-plus/chatglm.cpp) and xorbits-inference (coming soon).
|
| 10 |
+
|
| 11 |
+
# Prompt template
|
| 12 |
+
**NOTE**: prompt template is not available yet since the system prompt is hard coded in chatglm.cpp for now.
|
| 13 |
+
|
| 14 |
+
# Provided files
|
| 15 |
+
|
| 16 |
+
| Name | Quant method | Bits | Size |
|
| 17 |
+
|------|--------------|------|------|
|
| 18 |
+
| | | | |
|
| 19 |
+
| | | | |
|
| 20 |
+
|
| 21 |
+
# How to run in xorbits-inference
|
| 22 |
+
Coming soon.
|
| 23 |
+
|
| 24 |
+
# Slack
|
| 25 |
+
For further support, and discussions on these models and AI in general, join our [slack channel](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)!
|
| 26 |
+
|
| 27 |
+
# Original model card: THUDM's chatglm2 6B
|
| 28 |
+
ChatGLM**2**-6B is the second-generation version of the open-source bilingual (Chinese-English) chat model [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B). It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features:
|
| 29 |
+
|
| 30 |
+
1. **Stronger Performance**: Based on the development experience of the first-generation ChatGLM model, we have fully upgraded the base model of ChatGLM2-6B. ChatGLM2-6B uses the hybrid objective function of [GLM](https://github.com/THUDM/GLM), and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training. The [evaluation results](README.md#evaluation-results) show that, compared to the first-generation model, ChatGLM2-6B has achieved substantial improvements in performance on datasets like MMLU (+23%), CEval (+33%), GSM8K (+571%), BBH (+60%), showing strong competitiveness among models of the same size.
|
| 31 |
+
2. **Longer Context**: Based on [FlashAttention](https://github.com/HazyResearch/flash-attention) technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 8K during the dialogue alignment, allowing for more rounds of dialogue. However, the current version of ChatGLM2-6B has limited understanding of single-round ultra-long documents, which we will focus on optimizing in future iterations.
|
| 32 |
+
3. **More Efficient Inference**: Based on [Multi-Query Attention](http://arxiv.org/abs/1911.02150) technique, ChatGLM2-6B has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K.
|
| 33 |
+
|
| 34 |
+
For more instructions, including how to run CLI and web demos, and model quantization, please refer to our [Github Repo](https://github.com/THUDM/ChatGLM2-6B).
|