|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- gghfez/WizardLM-2-22b-RP |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- chat |
|
- creative |
|
- writing |
|
- roleplay |
|
--- |
|
|
|
EXL2 Quants of [gghfez/WizardLM-2-22B-RP](https://huggingface.co/gghfez/WizardLM-2-22B-RP) |
|
|
|
|BPW|Size| |
|
|---|---| |
|
|[8_0](https://huggingface.co/gghfez/WizardLM-2-22B-RP-exl2/tree/8_0)|18.2 GiB| |
|
|[6_0](https://huggingface.co/gghfez/WizardLM-2-22B-RP-exl2/tree/6_0)|15.8 GiB| |
|
|[5_0](https://huggingface.co/gghfez/WizardLM-2-22B-RP-exl2/tree/5_0)|13.2 GiB| |
|
|
|
Original Model Card: |
|
|
|
# gghfez/WizardLM2-22b-RP |
|
|
|
<img src="https://files.catbox.moe/acl4ld.png" width="400"/> |
|
|
|
⚠️ **IMPORTANT: Experimental Model - Not recommended for Production Use** |
|
- This is an experimental model created through bespoke, unorthodox merging techniques |
|
- The safety alignment and guardrails from the original WizardLM2 model may be compromised |
|
- This model is intended for creative writing and roleplay purposes ONLY |
|
- Use at your own risk and with appropriate content filtering in place |
|
|
|
|
|
This model is an experimental derivative of WizardLM2-8x22B, created by extracting the individual experts from the original mixture-of-experts (MoE) model, renaming the mlp modules to match the Mistral architecture, and merging them into a single dense model using linear merging via mergekit. |
|
|
|
The resulting model initially produced gibberish, but after fine-tuning on synthetic data generated by the original WizardLM2-8x22B, it regained the ability to generate relatively coherent text. However, the model exhibits confusion about world knowledge and mixes up the names of well known people. |
|
|
|
Despite efforts to train the model on factual data, the confusion persisted, so instead I trained it for creative tasks. |
|
|
|
As a result, this model is not recommended for use as a general assistant or for tasks that require accurate real-world knowledge (don't bother running MMLU-Pro on it). |
|
|
|
It actually retrieves details out of context very accurately, but I still can't recommend it for anything other than creative tasks. |
|
|
|
## Prompt format |
|
Mistral-v1 + the system tags from Mistral-V7 : |
|
``` |
|
[SYSTEM_PROMPT] {system}[SYSTEM_PROMPT] [INST] {prompt}[/INST] |
|
``` |
|
**NOTE:** This model is based on WizardLM2-8x22B, which is a finetune of Mixtral-8x22B - not to be confused with the more recent Mistral-Small-22B model. |
|
As such, it uses the same vocabulary and tokenizer as Mixtral-v0.1 and inherites the Apache2.0 license. |
|
I expanded the vocab to include the system prompt and instruction tags before training (including embedding heads). |
|
|
|
## Quants |
|
|
|
TODO |
|
|
|
## Examples: |
|
|
|
### Strength: Information Extraction from Context |
|
[example 1] |
|
|
|
### Weakness: Basic Factual Knowledge |
|
[example 2] |