Finetune Qwen2.5 14b?

by ElvisM - opened 2 days ago

2 days ago

For 16gb VRAM cards, I think this one is probably SOTA for story writing, mostly because, unlike Llama and Mistral, it handles long context pretty well. Any plans to fine-tune it one day? Would be very appreciated. I think I speak for most people that I'm kind of tired of having problems when the context reaches 16k tokens.

DavidAU

Owner 1 day ago

@ElvisM

This is up next. Very impressed with Qwen 2.5s; however I will tried to emulate this approach (and Brainstorm too):

https://huggingface.co/DavidAU/DeepSeek-Grand-Horror-SMB-R1-Distill-Llama-3.1-16B-GGUF

This takes only the "reasoning/thinking" parts of Deepseek and connects them to the core model.
The core model is fully retained, and augmented with Deepseek's tech.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment