Qwen3-8B-PlumEsper / README.md
sequelbox's picture
Upload folder using huggingface_hub
9585793 verified
---
base_model:
- ValiantLabs/Qwen3-8B-ShiningValiant3
- ValiantLabs/Qwen3-8B-Esper3
- Qwen/Qwen3-8B
library_name: transformers
tags:
- mergekit
- merge
- qwen
- qwen-3
- qwen-3-8b
- 8b
- reasoning
- code
- code-reasoning
- code-instruct
- python
- javascript
- dev-ops
- jenkins
- terraform
- scripting
- powershell
- azure
- aws
- gcp
- cloud
- science
- science-reasoning
- physics
- biology
- chemistry
- earth-science
- astronomy
- machine-learning
- artificial-intelligence
- compsci
- computer-science
- information-theory
- ML-Ops
- math
- cuda
- deep-learning
- transformers
- agentic
- LLM
- neuromorphic
- self-improvement
- complex-systems
- cognition
- linguistics
- philosophy
- logic
- epistemology
- simulation
- game-theory
- knowledge-management
- creativity
- problem-solving
- architect
- engineer
- developer
- creative
- analytical
- expert
- rationality
- conversational
- chat
- instruct
datasets:
- sequelbox/Celestia3-DeepSeek-R1-0528
- sequelbox/Mitakihara-DeepSeek-R1-0528
- sequelbox/Titanium2.1-DeepSeek-R1
- sequelbox/Tachibana2-DeepSeek-R1
- sequelbox/Raiden-DeepSeek-R1
---
# PlumEsper
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit), combining the specialty and general reasoning skills of Esper 3 8b and Shining Valiant 3 8b.
## Merge Details
### Merge Method
This model was merged using the [DELLA](https://arxiv.org/abs/2406.11617) merge method using [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) as a base.
### Models Merged
The following models were included in the merge:
* [ValiantLabs/Qwen3-8B-ShiningValiant3](https://huggingface.co/ValiantLabs/Qwen3-8B-ShiningValiant3)
* [ValiantLabs/Qwen3-8B-Esper3](https://huggingface.co/ValiantLabs/Qwen3-8B-Esper3)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
merge_method: della
dtype: bfloat16
parameters:
normalize: true
models:
- model: ValiantLabs/Qwen3-8B-Esper3
parameters:
density: 0.5
weight: 0.3
- model: ValiantLabs/Qwen3-8B-ShiningValiant3
parameters:
density: 0.5
weight: 0.3
base_model: Qwen/Qwen3-8B
```