File size: 11,014 Bytes
4822045
 
 
 
 
 
 
 
 
 
 
 
53ecae3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbf336e
 
 
8e308eb
ab9444c
 
8e308eb
cbf336e
a897f9e
 
 
 
 
60ef4aa
 
 
cbf336e
 
60ef4aa
cbf336e
 
 
 
 
60ef4aa
cbf336e
 
60ef4aa
cbf336e
 
60ef4aa
cbf336e
 
 
 
 
 
36cebee
cbf336e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60ef4aa
 
 
 
 
 
cbf336e
 
 
 
 
 
 
60ef4aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53ecae3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen2.5-14B-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- CoT
- Convsersational
- text-generation-inference
model-index:
- name: QwQ-LCoT-14B-Conversational
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: wis-k/instruction-following-eval
      split: train
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 40.47
      name: averaged accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT-14B-Conversational
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: SaylorTwift/bbh
      split: test
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 45.63
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT-14B-Conversational
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: lighteval/MATH-Hard
      split: test
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 31.42
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT-14B-Conversational
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      split: train
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 13.31
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT-14B-Conversational
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 20.62
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT-14B-Conversational
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 47.54
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT-14B-Conversational
      name: Open LLM Leaderboard
---
# **QwQ-LCoT-14B-Conversational**

QwQ-LCoT-14B-Conversational is a highly advanced AI model built upon the foundation of the Qwen 2.5 14B Instruct model, further refined and fine-tuned to handle complex, chain-of-thought-based long conversational scenarios. This fine-tuning enables the model to excel in tasks that require step-by-step reasoning, detailed explanations, and nuanced understanding of intricate topics. By leveraging its robust architecture and advanced training, QwQ-LCoT-14B-Conversational is optimized for use cases that demand precision, depth, and adaptability in dialogue.


This makes it particularly effective for applications such as long-form discussions, detailed problem-solving, and multi-step reasoning processes, allowing it to cater to a broad range of complex and versatile use cases. Its ability to maintain coherent and meaningful conversations over extended contexts positions it as an ideal choice for scenarios requiring thoughtful and dynamic interaction.


| Rank | Type | Model                             | Average | IFEval | BBH   | MATH  | GPQA  | MUSR  | MMLU  | CO₂ C  | Dated |
|------|------|-----------------------------------|---------|--------|-------|-------|-------|-------|-------|--------|----------|
| 323  | 🔶   | [prithiv/MLmods/QwQ-LCoT-14B-Conversational](#) | 33.17   | 40.47  | 45.63 | 31.42 | 13.31 | 20.62 | 47.54 | 1.95 |01/20/2025|

## **Key Features**

### **Enhanced Knowledge and Capabilities**
- **Coding and Mathematics**: Significantly improved performance in coding and mathematical tasks, thanks to specialized expert models in these domains.

### **Advanced Instruction Following**
- **Instruction Following**: Enhanced ability to follow instructions accurately, even for complex tasks.
- **Long Text Generation**: Capable of generating long texts exceeding 8,000 tokens.
- **Structured Data Understanding**: Improved understanding of structured data such as tables.
- **JSON Generation**: Exceptional ability to generate structured outputs, including JSON.

### **Resilient and Versatile**
- **Prompt Diversity**: Greater resilience to diverse system prompts, enhancing role-play scenarios and condition-setting for chatbots.

### **Long-Context Support**
- **Context Length**: Supports up to 128,000 tokens, with the ability to generate up to 8,000 tokens in a single response.

## **Quickstart**

Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/QwQ-LCoT-14B-Conversational"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

### **Multilingual Support**

QwQ-LCoT-14B-Conversational offers robust multilingual support, enabling seamless communication and interaction across over 29 languages. This includes widely spoken languages such as Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic, among others. Its ability to understand and generate text in multiple languages makes it an ideal choice for global applications, including multilingual customer support, cross-cultural communication, and localized content creation. Whether engaging in dialogue, solving problems, or generating content, the model ensures high-quality performance across diverse linguistic contexts, catering to a broad and international user base.

## **Applications**

QwQ-LCoT-14B-Conversational is ideal for:
- Long-form conversational AI
- Complex reasoning and chain-of-thought explanations
- Multilingual communication
- Structured data generation and processing
- Enhanced role-play and chatbot implementation

## **Intended Use**

1. **Long-Form Dialogue Systems**: QwQ-LCoT-14B-Conversational is designed for creating conversational agents capable of engaging in extended, context-rich dialogues, making it suitable for applications like customer support, virtual assistants, and interactive storytelling.

2. **Complex Reasoning Tasks**: The model excels at tasks requiring step-by-step reasoning, such as solving mathematical problems, coding challenges, and logical puzzles.

3. **Multilingual Communication**: With support for over 29 languages, the model is ideal for global applications, including multilingual customer service, translation, and cross-cultural communication.

4. **Structured Data Processing**: The model’s ability to understand and generate structured data (e.g., tables, JSON) makes it useful for data analysis, report generation, and API integration.

5. **Content Generation**: It can generate high-quality, long-form content, including articles, essays, and technical documentation, across various domains and languages.

6. **Role-Play and Chatbots**: The model’s resilience to diverse system prompts enhances its ability to simulate characters, role-play scenarios, and implement dynamic chatbot interactions.

## **Limitations**

1. **Performance Variability Across Languages**: While the model supports multiple languages, its performance may vary depending on the language, with better results for languages more prevalent in its training data.

2. **Handling of Niche Topics**: The model may struggle to provide accurate information or generate high-quality content for highly specialized or niche topics not covered extensively in its training data.

3. **Complex Multi-Step Reasoning**: Although optimized for reasoning tasks, the model may occasionally produce incorrect or incomplete results for highly complex or ambiguous problems.

4. **Bias and Ethical Concerns**: As with any large language model, QwQ-LCoT-14B-Conversational may inherit biases present in its training data, leading to potential ethical concerns or inappropriate outputs in certain contexts.

5. **Context Limitations**: Despite its large context window, the model may still face challenges in maintaining coherence and relevance for extremely long or dense inputs.

6. **Resource Intensive**: As a large-scale model with 14 billion parameters, it requires substantial computational resources for both inference and deployment, limiting its use in resource-constrained environments.

7. **Instruction Ambiguity**: The model’s performance can degrade when instructions are ambiguous, vague, or conflicting, potentially leading to outputs that do not align with user expectations.

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__QwQ-LCoT-14B-Conversational-details)!
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FQwQ-LCoT-14B-Conversational&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!

|      Metric       |Value (%)|
|-------------------|--------:|
|**Average**        |    33.16|
|IFEval (0-Shot)    |    40.47|
|BBH (3-Shot)       |    45.63|
|MATH Lvl 5 (4-Shot)|    31.42|
|GPQA (0-shot)      |    13.31|
|MuSR (0-shot)      |    20.62|
|MMLU-PRO (5-shot)  |    47.54|