Somewhat disappointing
Continued from v2 discussion https://huggingface.co/jukofyork/command-r-35b-writer-v2/discussions
I haven't used this nearly as much as I'd like; But from what I'm seeing as i try to use it (Q3_K), it is getting a lot of details wrong. I'll have a character defined as a dragon, and in one sentence they call me a dragon and then the next they call me a human. Orientation seems to be wrong quite often.
In another source, a vampire says they are weak to sunlight, and it's daylight out, so they'll 'wait outside' because they won't be much use to me.
It makes enough mistakes that i'm having to constantly correct everything seems like it's not quite worth my time.
I was just coming to post the opposite. My issue with v2 was that it's 'dumb' and didn't understand the story context, but v3 seems to be smarter in very initial testing.
I'm testing single-turn novel-style responses, the Cohere template for a system prompt, exl3, and temp-last sampling and a fairly low temperature; higher temps do go off the rails quick.
More testing to slowly come...
@yanno2mch
I made an exl3 here, forced 3bpw up_proj and 4bpw for all other layers. I can just fit 16K in 24GB: https://huggingface.co/Downtown-Case/jukofyork_command-r-35b-writer-v3-exl3-3.75bpw-hb6
It should have less quantization loss than a default Q3_K.
It wouldn't surprise me if it's slightly broken still - for now I'm just running some more tests on gemma-2:9b
as I have a feeling the problems I'm getting with command-r
are due to some bugs in the huggingface transformers tokenizer.
It wouldn't surprise me if it's slightly broken still - I have a feeling the problems I'm getting with
command-r
are due to some bugs in the huggingface transformers tokenizer.
Yeah it kinda did seem like it was two overlapping replies, either making sense but together not. That would explain why the replies also otherwise felt a little off. I'll look forward to the next iteration.
@jukofyork Regarding the command-r and command-r+ tokenizers in general, I've noticed there are like 10 different variants of them floating about with different sizes ranging from 12.8mb to 20mb.
Your model's tokenizer sha256sum is c69a7ea6c0927dfac8c349186ebcf0466a4723c21cbdb2e850cf559f0bee92b8 which matches the current main
branch of CohereLabs/c4ai-command-r-v01 and CohereLabs/c4ai-command-r-plus
This is the same as Downtown-Case's exl3 quant (nice I'll download this!)
The original tokenizer had this checksum: 0af6e6fe50ce1bb5611b103482de6bac000c82e06898138d57f35af121aec772, but then a HF staff member updated it to add tool calling tokens. When I diff'd them a while ago, I saw 5 tokens added (all related to tool calling)
Command-R+ originally had a 16.5MB tokenizer SHA256: 9619890aebac311d644236f49462d7f8618ebef7c7020c52645ccb597434a3c9 for a while, but now it has the same one as command-r-v0
They also changed the EOS token from 6 to 255001
And I'm not sure if this would have an impact: "tie_word_embeddings": False
lol @ this comment:
Remove comment in RoPE saying it is the same as llama
# copied from transformers.models.llama.modeling_llama.LlamaRotaryEmbedding with Llama->Cohere
I've seen various quants with a 20mb tokenizer.json too, and the tokenizer_config.json has been changed a few times.
God knows what llama.cpp / gguf is doing to them.
I can't find it now, but I think I saw they migrated to FastTokenizer at some point. I suspect something went wrong at one point, because I was using command-r+ at launch in exllamav2 as my daily model for everything for a while, and never saw the Chinese/Russian characters printed. But now I get them locally (.gguf) and even using openrouter and the cohere api directly.
p.s. +1 for gemma-2-9b, that's a very unique model (not gemma-2-27b) and I still run it sometimes.
Oh crap, I've likely been using the same tokenizer as I downloaded when if first came out (I squirrel away models on a couple of 20TB drives to save redownloading them on my slow ADSL). I usually check if the JSON files have been updated and refresh them, but it never occurred to me the tokenizer file itself would change :((
p.s. +1 for gemma-2-9b, that's a very unique model (not gemma-2-27b) and I still run it sometimes.
Yeah, it seems to be the smallest model people have successfully fine-tuned creative writing models on and this particular fine-tune/merge hybrid people still seem to like:
I'm pretty sure I'll soon have the Control Adapters stuff working as well as the Control Vectors:
- After a lot of hours working out the maths I've actually got something now that is almost exactly the multiplicative "partner" to the way I found to calculate the (additive) Control Vectors (ie: now actually parameterised as an Eigen-decompostion of a symmetric square matrix).
- I think I've found at last what causes the models to "drift" and screw up the non-writing ability too. I can't seem to find a good regulariser to avoid this: they all just end up turning my rank-64 adapter into having an effective-rank of ~1.5 and totally ruin it... BUT: by adding ~10% random data (eg: 50/50 webcrawl and instruct data) to the positive and negative classes, I can get a pretty similar effect and avoid the "drift" by insisting each "side" of the Control Adapter learn directions that don't screw up non-writing tasks that it would never see otherwise.
I've only got this working the last couple of days so pretty excited to see what comes out! The very first rank-16 version of command-r:35b
and qwq:32b
looked so promising, so fingers crossed!!! :D
Do any of you have any recommendations for other smallish models I can try? There are so many different mistral
models in the 12-24B range, I've no idea which might be a good target?
I don't want to deal with gemma-3
yet as it needs two sets of RoPE cos/sin values passing through it and my forked version of qlora-pipe
would need a fair bit of reworking to add these.
I'm particularly interested in smallish "smart" models that don't have great prose (but not terrible prose...).
It would be interesting to see if I can keep the "smarts" and just effect the prose...
(I have a feeling that the mixing
@yano2mch
is seeing in this model is because I separated the paragraphs using double newlines, whereas before [and again now] I separated using the EOS
tokens).
(I have a feeling that the mixing @yano2mch is seeing in this model is because I separated the paragraphs using double newlines, whereas before [and again now] I separated using the
EOS
tokens).
Maybe. It kinda feels more like one sentence is written by someone and then the next sentence is written by someone else (i am harmed by sunlight, i will wait outside, it is light out right now), that or the logic/consistency doesn't fit in that instance for some reason (or i'm being too picky, that's possible). The vampire one i mentioned especially stands out. I'll post a bit of the output here to show what i mean.
Here's the important parts. Snipped out unnecessary flirting. Formatting is easier with BBCode. If you want the unaltered logs i can provide (which is mostly the intro and him rolling a rare vampire).
"My statistics, as you requested. I am a vampire, one of the most powerful of my kind. I possess enhanced strength, agility, and reflexes. I do not tire or grow bored, and my senses are far beyond those of a normal human. My hearing is especially keen. Because I am a vampire, I cannot withstand sunlight, and the holy symbols of certain religious orders have power over me." A frown creases her brow slightly. "However, you will find that I am a loyal companion. I will not betray you unless you abuse me." A small, enigmatic smile. "But that won't be a problem, will it? I can't wait to help you in your adventures. And who knows, perhaps you'll help me as well."
She steps back, giving you a deep,courteous curtsey. "It's nice to finally meet you, master." She gives you a wink. "Shall we begin?"
Yano: "A vampire. Not a big surprise based on your outfit..." he touches her cheek and murrs. "Can i summon and unsummon you?" he asks, and glances at the door, it is likely daytime right now. "I'm not sure we can safely leave if you can't handle sunlight. Sounds like we'll be taking night jobs then."
-snip-
Gacha Isekai: -snip-
She reaches forward and places her hands on your shoulders and leans in close. "Now, shall we go and see what's outside?"
Yano: "What happens when you are unsummoned? Are you nearby? Do you sleep as an object? or do you cease to be until called?" he asks.
He moves to the door and opens it a crack to look outside to see how bright it is, it might be safe to take her outside.
Gacha Isekai: "When you unsummon me, I remain close by, in a dormant state. I can sense the world around me but I cannot act or move until you summon me once more. I do not 'sleep' per say, I simply await your call. If you wish, I can wait outside for you while you tend to whatever business you have," Lamia suggests. "I can protect myself from most threats, and as the sun is out, I would not be of much use to you in any case." She steps back from you, a cautious expression on her face.
If you want the unaltered logs i can provide
Do you have an initial prompt I can try for comparison, for simple testing?
I ask because:
Do any of you have any recommendations for other smallish models I can try?
Maybe, but I feel like my own testing is pretty limited in scope, so I'd like to see how other's prompts do.