--- library_name: transformers license: mit language: - ja base_model: - cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese --- # Model Card for Model ID ## Model Details ### Model Description This model is finetuned on conversational data for chat in Japanese. - Developed by: [flypg](https://huggingface.co/flypg) - Model type: Causal Lanuage Model - Language(s) (NLP): Japanese - License: MIT - Finetuned from model:cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese ## Uses ### Direct Use The model can be directly used for casual conversation in Japanese. ## Bias, Risks, and Limitations - Small Dataset: the model is finetuned on relatively small dataset (<1000 conversations). The model may overfit or produce repetitive answers. - Bias / Toxicity: As with any LLM, it could generate offensive or biased outputs in certain contexts. - Limitations: Please take your only risk using the model beyond casual converstaion. ## Get Started with the Model Below is a minimal example of how to load and use this model for inference in Python. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "flypg/DeepSeek-R1-Distill-Qwen-14B-Japanese-chat" tokenizer = AutoTokenizer.from_pretrained( model_name, ) model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, device_map="auto", torch_dtype=torch.float16 ) model.eval() prompt = "your prompt" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens=100, temperature=0.7, top_p=0.9, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(response) ``` ## Training Details ### Training Procedure & Hyperparameters - Fine-Tuning Method: LoRA - Framework & Tools: Hugging Face Transformers PEFT - Hyperparameters: - Learning rate: 1e-5 - Batch size: 2 (with gradient accumulation) - Num epochs: 3 - Training regime: fp16 mixed precision ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - Hardware Type: Nvdia A100 PCle - Hours used: 5 - Cloud Provider: Private Infrastructure - Compute Region: US-central - Carbon Emitted: 320g CO2 eq. ## Citation If you use this model in your research or work, please cite it using the following BibTeX entry: ```bibtex @misc{DeepSeek R1-Qwen Model for Chat in Japenese, title={DeepSeek-R1-Distill-Qwen-14B-Japanese-chat: A Fine-Tuned Qwen-based Model for Chat in Japenese}, author={flypg}, year={2025}, howpublished={\url{https://huggingface.co/flypg/DeepSeek-R1-Distill-Qwen-14B-Japanese-chat}}, note={Accessed: YYYY-MM-DD} } ## Contact [kenkun091](https://github.com/kenkun091) Please feel free to open an issue.