Text Generation
Transformers
Safetensors
qwen3
mergekit
Merge
qwen
qwen-3
qwen-3-8b
8b
reasoning
code
code-reasoning
code-instruct
python
javascript
dev-ops
jenkins
terraform
scripting
powershell
azure
aws
gcp
cloud
science
science-reasoning
physics
biology
chemistry
earth-science
astronomy
machine-learning
artificial-intelligence
compsci
computer-science
information-theory
ML-Ops
math
cuda
deep-learning
agentic
LLM
neuromorphic
self-improvement
complex-systems
cognition
linguistics
philosophy
logic
epistemology
simulation
game-theory
knowledge-management
creativity
problem-solving
architect
engineer
developer
creative
analytical
expert
rationality
conversational
chat
instruct
text-generation-inference
base_model: | |
- ValiantLabs/Qwen3-8B-ShiningValiant3 | |
- ValiantLabs/Qwen3-8B-Esper3 | |
- Qwen/Qwen3-8B | |
library_name: transformers | |
tags: | |
- mergekit | |
- merge | |
- qwen | |
- qwen-3 | |
- qwen-3-8b | |
- 8b | |
- reasoning | |
- code | |
- code-reasoning | |
- code-instruct | |
- python | |
- javascript | |
- dev-ops | |
- jenkins | |
- terraform | |
- scripting | |
- powershell | |
- azure | |
- aws | |
- gcp | |
- cloud | |
- science | |
- science-reasoning | |
- physics | |
- biology | |
- chemistry | |
- earth-science | |
- astronomy | |
- machine-learning | |
- artificial-intelligence | |
- compsci | |
- computer-science | |
- information-theory | |
- ML-Ops | |
- math | |
- cuda | |
- deep-learning | |
- transformers | |
- agentic | |
- LLM | |
- neuromorphic | |
- self-improvement | |
- complex-systems | |
- cognition | |
- linguistics | |
- philosophy | |
- logic | |
- epistemology | |
- simulation | |
- game-theory | |
- knowledge-management | |
- creativity | |
- problem-solving | |
- architect | |
- engineer | |
- developer | |
- creative | |
- analytical | |
- expert | |
- rationality | |
- conversational | |
- chat | |
- instruct | |
datasets: | |
- sequelbox/Celestia3-DeepSeek-R1-0528 | |
- sequelbox/Mitakihara-DeepSeek-R1-0528 | |
- sequelbox/Titanium2.1-DeepSeek-R1 | |
- sequelbox/Tachibana2-DeepSeek-R1 | |
- sequelbox/Raiden-DeepSeek-R1 | |
# PlumEsper | |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit), combining the specialty and general reasoning skills of Esper 3 8b and Shining Valiant 3 8b. | |
## Merge Details | |
### Merge Method | |
This model was merged using the [DELLA](https://arxiv.org/abs/2406.11617) merge method using [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) as a base. | |
### Models Merged | |
The following models were included in the merge: | |
* [ValiantLabs/Qwen3-8B-ShiningValiant3](https://huggingface.co/ValiantLabs/Qwen3-8B-ShiningValiant3) | |
* [ValiantLabs/Qwen3-8B-Esper3](https://huggingface.co/ValiantLabs/Qwen3-8B-Esper3) | |
### Configuration | |
The following YAML configuration was used to produce this model: | |
```yaml | |
merge_method: della | |
dtype: bfloat16 | |
parameters: | |
normalize: true | |
models: | |
- model: ValiantLabs/Qwen3-8B-Esper3 | |
parameters: | |
density: 0.5 | |
weight: 0.3 | |
- model: ValiantLabs/Qwen3-8B-ShiningValiant3 | |
parameters: | |
density: 0.5 | |
weight: 0.3 | |
base_model: Qwen/Qwen3-8B | |
``` | |