--- base_model: [] library_name: transformers tags: - mergekit - merge --- # Stellar Odyssey 12b v0.0 *Join my dream, it's just the right time, whoa... Leave it all behind... Get ready now... Riise up into my world~* Listen to the song on youtube: https://www.youtube.com/watch?v=npyiiInMA0w This is my second attempt at a model merge, This time, these models were used - mistralai/Mistral-Nemo-Base-2407 - Sao10K/MN-12B-Lyra-v4 - nothingiisreal/MN-12B-Starcannon-v2 - Gryphe/Pantheon-RP-1.5-12b-Nemo License for this model is: Apache 2.0 (due to the base model, Mistral Nemo Base 2407) Intended Use case: Roleplay Instruction Format: ChatML Thank you to AuriAetherwiing for helping me merge the models. # Data? This is a hard question to answer, I didn't add any data to the model itself, rather it's a merge of other models, so the data used for them applies to this model too, though it won't be the same. ## Merge Details This model was merged using the della_linear merge method using mistralai/Mistral-Nemo-Base-2407 as a base. ### Models Merged The following models were included in the merge: * Sao10K/MN-12B-Lyra-v4 * Gryphe/Pantheon-RP-1.5-12b-Nemo * nothingiisreal/MN-12B-Starcannon-v2 ### Configuration The following YAML configuration was used to produce this model: ```yaml models: - model: C:\Users\\Downloads\Mergekit-Fixed\mergekit\Sao10K_MN-12B-Lyra-v4 parameters: weight: 0.3 density: 0.25 - model: C:\Users\\Downloads\Mergekit-Fixed\mergekit\nothingiisreal_MN-12B-Starcannon-v2 parameters: weight: 0.1 density: 0.4 - model: C:\Users\\Downloads\Mergekit-Fixed\mergekit\Gryphe_Pantheon-RP-1.5-12b-Nemo parameters: weight: 0.4 density: 0.5 merge_method: della_linear base_model: C:\Users\\Downloads\Mergekit-Fixed\mergekit\mistralai_Mistral-Nemo-Base-2407 parameters: epsilon: 0.05 lambda: 1 merge_method: della_linear dtype: bfloat16 ``` ## Notes Della_Linear: Refer to https://arxiv.org/abs/2406.11617 and https://arxiv.org/abs/2212.04089, as it is quite long to explain what Della_Linear is BFloat16: Brain Floating Point 16, a way to run models faster on Nvidia GPUs Density: Fraction of weights in differences from the base model to retain Epsilon: Maximum change in drop probability based on magnitude. Drop probabilities assigned will range