File size: 3,671 Bytes
0bd88d2
 
 
46df940
0bd88d2
 
46df940
 
 
 
 
 
 
 
 
0bd88d2
 
 
 
 
46df940
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bd88d2
46df940
0bd88d2
46df940
0bd88d2
46df940
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
base_model: ibm-granite/granite-3.1-2b-instruct
tags:
- text-generation
- transformers
- gguf
- english
- granite
- text-generation-inference
- inference-endpoints
- conversational
- 4-bit
- 5-bit
- 8-bit
- ruslanmv
license: apache-2.0
language:
- en
---

# Granite-3.1-2B-Reasoning-GGUF (Quantized for Efficiency)

## Model Overview

This is a **GGUF quantized version** of **ruslanmv/granite-3.1-2b-Reasoning**, fine-tuned from **ibm-granite/granite-3.1-2b-instruct**. The **GGUF format** allows for efficient inference on **CPU and GPU**, optimized for use with **Kbit quantization levels** (4-bit, 5-bit, and 8-bit).

- **Developed by:** [ruslanmv](https://huggingface.co/ruslanmv)  
- **License:** Apache 2.0  
- **Base Model:** [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct)  
- **Fine-tuned for:** Logical reasoning, structured problem-solving, long-context tasks  
- **Quantized GGUF versions available:**  
  - **4-bit:** `Q4_K_M`  
  - **5-bit:** `Q5_K_M`  
  - **8-bit:** `Q8_0`  
- **Supported Languages:** English  
- **Architecture:** **Granite**  
- **Model Size:** **2.53B params**  

---

## Why Use the GGUF Quantized Version?

The **GGUF format** is designed for optimized **CPU and GPU inference**, enabling:  

✅ **Lower memory usage** for running on consumer hardware  
✅ **Faster inference speeds** without compromising reasoning ability  
✅ **Compatibility with popular inference engines** like llama.cpp, ctransformers, and KoboldCpp  

---

## Installation & Usage  

To use this model with **llama.cpp**, install the required dependencies:

```bash
pip install llama-cpp-python
```

### Running the Model  

To run the model using **llama.cpp**:

```bash
from llama_cpp import Llama

model_path = "path/to/ruslanmv/granite-3.1-2b-Reasoning-GGUF.Q4_K_M.gguf"

llm = Llama(model_path=model_path)

input_text = "Can you explain the difference between inductive and deductive reasoning?"
output = llm(input_text, max_tokens=400)

print(output["choices"][0]["text"])
```

Alternatively, using **ctransformers**:

```bash
pip install ctransformers
```

```python
from ctransformers import AutoModelForCausalLM

model_path = "path/to/ruslanmv/granite-3.1-2b-Reasoning-GGUF.Q4_K_M.gguf"

model = AutoModelForCausalLM.from_pretrained(model_path, model_type="llama", gpu_layers=50)

input_text = "What are the key principles of logical reasoning?"
output = model(input_text, max_new_tokens=400)

print(output)
```

---

## Intended Use  

Granite-3.1-2B-Reasoning-GGUF is optimized for **efficient inference** while maintaining strong **reasoning capabilities**, making it ideal for:  

- **Logical and analytical problem-solving**  
- **Text-based reasoning tasks**  
- **Mathematical and symbolic reasoning**  
- **Advanced instruction-following**  

This model is particularly useful for **CPU-based deployments** and users who need **low-memory, high-performance** text generation.

---

## License & Acknowledgments  

This model is released under the **Apache 2.0** license. It is fine-tuned from IBM’s **Granite 3.1-2B-Instruct** model and **quantized using GGUF** for optimal efficiency. Special thanks to the **IBM Granite Team** for developing the base model.  

For more details, visit the [IBM Granite Documentation](https://huggingface.co/ibm-granite).  

---

### Citation  

If you use this model in your research or applications, please cite:  

```
@misc{ruslanmv2025granite,
  title={Fine-Tuning and GGUF Quantization of Granite-3.1 for Advanced Reasoning},
  author={Ruslan M.V.},
  year={2025},
  url={https://huggingface.co/ruslanmv/granite-3.1-2b-Reasoning-GGUF}
}
```