Update README.md

6e82ce8 verified 4 days ago

3.85 kB

	---
	library_name: transformers
	license: mit
	language:
	- ja
	base_model:
	- cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This model is finetuned on conversational data for chat in Japanese.

	- Developed by: [flypg](https://huggingface.co/flypg)
	- Model type: Causal Lanuage Model
	- Language(s) (NLP): Japanese
	- License: MIT
	- Finetuned from model:cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese

	## Uses
	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
	The model can be directly used for casual conversation in Japanese.

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->
	- Small Dataset: the model is finetuned on relatively small dataset (<1000 conversations). The model may overfit or produce repetitive answers.
	- Bias / Toxicity: As with any LLM, it could generate offensive or biased outputs in certain contexts.
	- Limitations: Please take your only risk using the model beyond casual converstaion.

	## Get Started with the Model

	Below is a minimal example of how to load and use this model for inference in Python.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "flypg/DeepSeek-R1-Distill-Qwen-14B-Japanese-chat"

	tokenizer = AutoTokenizer.from_pretrained(
	model_name,
	)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	trust_remote_code=True,
	device_map="auto",
	torch_dtype=torch.float16
	)
	model.eval()

	prompt = "your prompt"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	output_ids = model.generate(
	**inputs,
	max_new_tokens=100,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
	print(response)
	```

	## Training Details

	### Training Procedure & Hyperparameters

	- Fine-Tuning Method: LoRA
	- Framework & Tools:
	Hugging Face Transformers
	PEFT
	- Hyperparameters:
	- Learning rate: 1e-5
	- Batch size: 2 (with gradient accumulation)
	- Num epochs: 3
	- Training regime: fp16 mixed precision


	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: Nvdia A100 PCle
	- Hours used: 5
	- Cloud Provider: Private Infrastructure
	- Compute Region: US-central
	- Carbon Emitted: 320g CO2 eq.

	## Citation
	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
	If you use this model in your research or work, please cite it using the following BibTeX entry:

	```bibtex
	@misc{DeepSeek R1-Qwen Model for Chat in Japenese,
	title={DeepSeek-R1-Distill-Qwen-14B-Japanese-chat: A Fine-Tuned Qwen-based Model for Chat in Japenese},
	author={flypg},
	year={2025},
	howpublished={\url{https://huggingface.co/flypg/DeepSeek-R1-Distill-Qwen-14B-Japanese-chat}},
	note={Accessed: YYYY-MM-DD}
	}

	## Contact
	[kenkun091](https://github.com/kenkun091)
	Please feel free to open an issue.