TinyDeepSeek-JP-1.5B

本モデルは, DeepSeek-R1の小型蒸留モデルに日本語を追加学習したcyberagent/DeepSeek-R1-Distill-Qwen-14B-Japaneseに対し、 SakanaAI社が提案した新たな蒸留手法TAIDを適用して小型化したものです.

Teacher model : cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese
Student model : SakanaAI/TinySwallow-1.5B-Instruct

注）このモデルは発展途上です。

Uses

Uses follow the original models.
This model is provided for research and development purposes only and should be considered as an experimental prototype. It is not intended for commercial use or deployment in mission-critical environments. Use of this model is at the user's own risk, and its performance and outcomes are not guaranteed. EQUES Inc. shall not be liable for any direct, indirect, special, incidental, or consequential damages, or any loss arising from the use of this model, regardless of the results obtained. Users must fully understand the risks associated with the use of this model and use it at their own discretion.

Output Examples

大規模言語モデルについて教えて。

まず、大規模言語モデルの学習には、膨大な量のデータが必要です。例えば、GPT-3（OpenAI）の場合、1750億パラメータを持ち、約2兆単語のテキストデータで訓練されています。この大量のデータから、モデルはパターンや構造を学びます。次に、モデルは複数の層（レイヤー）で構成され、各層は前層の出力を元に新たな表現を作成します。例えば、Transformerアーキテクチャでは、Attention Mechanismを使用して、入力文脈全体を考慮に入れることで、長距離依存関係を捉えることが可能です。さらに、これらのモデルは継続的に更新されており、最新の研究では、モデルの精度と性能が向上しています。これにより、大規模言語モデルは多様なタスクに対応できるようになります。 ただし、大規模言語モデルの学習には多くの計算リソースが必要です。特に、トレーニング中に必要となるGPUやTPUなどのハードウェア資源がコストがかかります。また、データのプライバシーとセキュリティも重要な課題です。今後の研究では、より効率的で持続可能な学習方法を開発する必要があります。

これらは大規模言語モデルの学習における主な要素ですが、他にも多くの要素があります。例えば、モデルの最適化や検証といったプロセスもあります。これらの要素全てが、モデルの精度と実用性を高めるために不可欠です。

以下は算数の問題です。よく考えて回答してください。正六角形は六つの正三角形に分割できます。もしそのうちの一つの三角形の周の長さが21インチである場合、正六角形の周の長さは何インチですか？

まず、正六角形は6つの正三角形で構成されています。つまり、各正三角形の周の長さは30.54インチ（約78.9cm）になります。

次に、この一つの三角形の周の長さから、正六角形の周の長さを求めます。ただし、その計算には他の情報が必要です。例えば、正六角形の内角や辺の数などの情報を知り、それらを使って計算します。

ただし、問題文では「正六角形の周の長さ」という表現があるので、全体の周長を考えています。正六角形の周の長さは、すべての六つの三角形の周の長さを合計したものになります。

したがって、正六角形の周の長さは 30.54 × 6 = 183.24インチ（約55.42cm）となります。

答え: 183.24インチ（約55.42cm）

Pythonでスクレイピングを行うコードを生成して。

テキストから情報を抽出するためには、まずHTML構造を理解する必要があります。例えば、次のような場合があります。

画像が含まれている
リンクやアンカーが存在する
サブセクションがある
これらは全てHTMLの要素として扱い、それぞれに対応するメソッドを使用します。

このタスクでは、次の手順に従います:

HTML文書を読み込みます
要素を特定
パラメータを指定
値を取得
各ステップについて説明してください。

また、必要なライブラリをインポートします。

例: テキストからURLを抽出する場合、以下のように実装できます。

from bs4 import BeautifulSoup
import requests

url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
urls = soup.find_all('a')

for link in urls:
    print(link.get('href'))

このコードは、requests.get()を使ってURLを取得し、BeautifulSoup()でHTMLを解析します。その後、find_all('a')を使ってすべてのリンクを検索します。これにより、すべてのリンクのURLを取得できます。

ただし、この方法では複数の階層構造（サブセクション）を処理できません。その場合は、さらに深く解析が必要です。


### Sample Usage
```python
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "EQUES/TinyDeepSeek-JP-1.5B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "大規模言語モデルについて教えて。"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

License

Apache-2.0

Acknowledgement

SakanaAI & Swallow team : development and release of TinySwallow-1.5B
SakanaAI : development of TAID
CyberAgent : development of DeepSeek-R1-Distill-Qwen-14B-Japanese

EQUES
/

TinyDeepSeek-JP-1.5B

TinyDeepSeek-JP-1.5B

Uses

Output Examples

License

Acknowledgement

Model tree for EQUES/TinyDeepSeek-JP-1.5B

Dataset used to train EQUES/TinyDeepSeek-JP-1.5B

Collection including EQUES/TinyDeepSeek-JP-1.5B

Reasoning-SLMs