--- base_model: - mistralai/Pixtral-12B-2409 - TheDrummer/UnslopNemo-12B-v3 base_model_relation: merge library_name: transformers tags: - mergekit - merge - multimodal - mistral - pixtral language: - en - fr - de - es - it - pt - ru - zh - ja license: other pipeline_tag: image-text-to-text --- # Razorback 12B v0.2 #### UnslopNemo with Vision! A more robust attempt at merging TheDrummer's UnslopNemo v3 into Pixtral 12B. Has been really stable in my testing so far. Needs more testing to see what samplers it does/doesn't like. Seems to be the best of both worlds - less sloppy, more engaging content and decent intelligence / visual understanding. ## Merging Approach First, I loaded up Pixtral 12B Base and Mistral Nemo Base to compare their parameter differences. Looking at the L2 norm / relative difference values I was able to isolate which parts of Pixtral 12B are a significant deviation from Mistral Nemo. Because while the language model architecture is the same between the two, a lot of vision understanding has been trained into Pixtral's language model and can break very easily. Then I calculated merging weights for each parameter using an exponential falloff. The smaller the difference, the higher the weight. Applied this recipe to Pixtral Instruct (Pixtral-12B-2409) and TheDrummer's UnslopNemo-12B-v3. The goal is to infuse as much Drummer goodness without breaking vision input. And it looks like it's worked! ## Usage Needs more testing to identify best sampling params, but so far just using ~0.7 temp + 0.03 min p has been rock solid. Use the included chat template (Mistral). No chatml support yet. ## Credits - Mistral for [mistralai/Pixtral-12B-2409](https://huggingface.co/mistralai/Pixtral-12B-2409) - Unsloth for [unsloth/Pixtral-12B-2409](https://huggingface.co/unsloth/Pixtral-12B-2409) transformers conversion - TheDrummer for [TheDrummer/UnslopNemo-12B-v3](https://huggingface.co/TheDrummer/UnslopNemo-12B-v3)