Finetune Qwen2.5 14b?

#2
by ElvisM - opened

For 16gb VRAM cards, I think this one is probably SOTA for story writing, mostly because, unlike Llama and Mistral, it handles long context pretty well. Any plans to fine-tune it one day? Would be very appreciated. I think I speak for most people that I'm kind of tired of having problems when the context reaches 16k tokens.

@ElvisM

This is up next. Very impressed with Qwen 2.5s; however I will tried to emulate this approach (and Brainstorm too):

https://huggingface.co/DavidAU/DeepSeek-Grand-Horror-SMB-R1-Distill-Llama-3.1-16B-GGUF

This takes only the "reasoning/thinking" parts of Deepseek and connects them to the core model.
The core model is fully retained, and augmented with Deepseek's tech.

Sign up or log in to comment