Qwen2.5-14B-DeepSeek-R1-1M

A merged model combines the reasoning model's strengths (Qwen2.5-14B-DeepSeek-R1) and the long-context model capabilities (Qwen2.5-14B-Instruct-1M) for versatile performance.

Merge config

models:
  - model: "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"
    parameters:
      weight: 1
      density: 1

merge_method: ties
base_model: "Qwen/Qwen2.5-14B-Instruct-1M"
parameters:
  density: 1
  normalize: true
  int8_mask: true
dtype: bfloat16

and I needed to make some minor adjustments to the tokenizer configuration.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "mkurman/Qwen2.5-14B-DeepSeek-R1-1M"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Write a Python script to merge two CSV files."
messages = [
    {"role": "system", "content": "You are an expert programmer."},
    {"role": "user", "content": prompt}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You can use it on LM Studio or Ollama by utilizing the provided GGUF files.

License

Apache 2.0 for open-source contribution and collaboration.

Downloads last month
2,935
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for mkurman/Qwen2.5-14B-DeepSeek-R1-1M

Base model

Qwen/Qwen2.5-14B
Quantized
(34)
this model
Merges
2 models
Quantizations
1 model

Space using mkurman/Qwen2.5-14B-DeepSeek-R1-1M 1