taoki
/

TinySwallow-1.5B-Instruct-w8a16

Text Generation

M5Stack Module LLM

Model card Files Files and versions Community

taoki commited on Feb 2

Commit

05683aa

·

verified ·

1 Parent(s): edaa9be

Update README.md

Files changed (1) hide show

README.md +57 -3

README.md CHANGED Viewed

@@ -1,3 +1,57 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- ja
+pipeline_tag: text-generation
+base_model: SakanaAI/TinySwallow-1.5B-Instruct
+datasets:
+- tokyotech-llm/lmsys-chat-1m-synth
+- tokyotech-llm/swallow-magpie-ultra-v0.1
+- tokyotech-llm/swallow-swallow-gemma-magpie-v0.1
+tags:
+- M5Stack Module LLM
+---
+# TinySwallow-1.5B-Instruct-w8a16
+このモデルは、[SakanaAI/TinySwallow-1.5B-Instruct](https://huggingface.co/SakanaAI/TinySwallow-1.5B-Instruct) を、
+[M5Stack Module LLM](https://docs.m5stack.com/ja/module/Module-LLM)向けに、[ax-llm-build](https://github.com/AXERA-TECH/ax-llm-build)で変換したモデルになります。
+詳細な変換手順は[pulsar2ドキュメント](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html#large-model-compilation-experimental-stage)を参考にしてください。
+アップロードした`axmodel`は次のようなコマンドで変換しています。
+```bash
+pulsar2 llm_build --input_path /path/to/TinySwallow-1.5B-Instruct/ --output_path /path/to/TinySwallow-1.5B-Instruct-ax630c --kv_cache_len 1653 --model_config /path/to/TinySwallow-1.5B-Instruct/config.json --hidden_state_type bf16 --chip AX620E --prefill_len 128
+```
+※ 利用したpulser2コンテナは3.3になります。
+## 使い方
+別途`main_prefill`を入手し本リポジトリファイル群とともにModule LLMに配備してください。
+- [AXERA-TECH/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/AXERA-TECH/DeepSeek-R1-Distill-Qwen-1.5B) など
+DeepSeek-R1の例に従い事前にトークナイザをHTTPサーバとして起動します（別ターミナルまたはバックグラウンド起動）。
+```bash
+python3 tinyswallow_tokenizer.py
+```
+トークナイザ起動語にデバッグ用のスクリプトを実行します。
+```bash
+./run_TinySwallow_1.5B_prefill_ax630c.sh
+[I][                            Init][ 125]: LLM init start
+bos_id: -1, eos_id: 151645
+  3% | ██                                |   1 /  31 [0.01s<0.28s, 111.11 count/s] tokenizer init ok[I][                            Init][  26]: LLaMaEmbedSelector use mmap
+100% | ████████████████████████████████ |  31 /  31 [8.15s<8.15s, 3.80 count/s] init post axmodel ok,remain_cmm(1434 MB)[I][                            Init][ 241]: max_token_len : 1653
+[I][                            Init][ 246]: kv_cache_size : 256, kv_cache_num: 1653
+[I][                            Init][ 254]: prefill_token_num : 128
+[I][                            Init][ 263]: LLM init ok
+Type "q" to exit, Ctrl+c to stop current running
+>> こんにちは！
+[I][                             Run][ 484]: ttft: 1066.67 ms
+こんにちは！ ��
+何かお手伝いできることはありますか？ 😊
+```