flypg's picture
Update README.md
6e82ce8 verified
---
library_name: transformers
license: mit
language:
- ja
base_model:
- cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This model is finetuned on conversational data for chat in Japanese.
- Developed by: [flypg](https://huggingface.co/flypg)
- Model type: Causal Lanuage Model
- Language(s) (NLP): Japanese
- License: MIT
- Finetuned from model:cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
The model can be directly used for casual conversation in Japanese.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
- Small Dataset: the model is finetuned on relatively small dataset (<1000 conversations). The model may overfit or produce repetitive answers.
- Bias / Toxicity: As with any LLM, it could generate offensive or biased outputs in certain contexts.
- Limitations: Please take your only risk using the model beyond casual converstaion.
## Get Started with the Model
Below is a minimal example of how to load and use this model for inference in Python.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "flypg/DeepSeek-R1-Distill-Qwen-14B-Japanese-chat"
tokenizer = AutoTokenizer.from_pretrained(
model_name,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.float16
)
model.eval()
prompt = "your prompt"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(response)
```
## Training Details
### Training Procedure & Hyperparameters
- Fine-Tuning Method: LoRA
- Framework & Tools:
Hugging Face Transformers
PEFT
- Hyperparameters:
- Learning rate: 1e-5
- Batch size: 2 (with gradient accumulation)
- Num epochs: 3
- Training regime: fp16 mixed precision
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- Hardware Type: Nvdia A100 PCle
- Hours used: 5
- Cloud Provider: Private Infrastructure
- Compute Region: US-central
- Carbon Emitted: 320g CO2 eq.
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you use this model in your research or work, please cite it using the following BibTeX entry:
```bibtex
@misc{DeepSeek R1-Qwen Model for Chat in Japenese,
title={DeepSeek-R1-Distill-Qwen-14B-Japanese-chat: A Fine-Tuned Qwen-based Model for Chat in Japenese},
author={flypg},
year={2025},
howpublished={\url{https://huggingface.co/flypg/DeepSeek-R1-Distill-Qwen-14B-Japanese-chat}},
note={Accessed: YYYY-MM-DD}
}
## Contact
[kenkun091](https://github.com/kenkun091)
Please feel free to open an issue.