SuperNintendoChalmers's picture

12 7

SuperNintendoChalmers

SuperNintendoChalmers

·

AI & ML interests

None yet

Recent Activity

liked a dataset 2 days ago

eaddario/imatrix-calibration

new activity 19 days ago

mradermacher/model_requests:allura-org/L3.1-8b-RP-Ink

new activity 20 days ago

mradermacher/model_requests:FreedomIntelligence/Apollo-MoE-7B

View all activity

Organizations

SuperNintendoChalmers's activity

liked a dataset 2 days ago

eaddario/imatrix-calibration

Viewer • Updated 12 days ago • 907k • 13.6k • 2

New activity in mradermacher/model_requests 19 days ago

allura-org/L3.1-8b-RP-Ink

#713 opened 19 days ago by

SuperNintendoChalmers

New activity in mradermacher/model_requests 20 days ago

FreedomIntelligence/Apollo-MoE-7B

#709 opened 20 days ago by

SuperNintendoChalmers

New activity in mradermacher/model_requests 22 days ago

Tulu 3.1

#687 opened 25 days ago by

SuperNintendoChalmers

liked 2 models 23 days ago

TeeZee/Buttocks-7B-v1.1

Text Generation • Updated Mar 4, 2024 • 245 • 3

TeeZee/Buttocks-7B-v1.0

Text Generation • Updated Mar 4, 2024 • 243 • 4

New activity in mradermacher/model_requests 25 days ago

Any value in static quants?

#688 opened 25 days ago by

SuperNintendoChalmers

New activity in motexture/cData about 1 month ago

Very cool! Also how?

#2 opened about 1 month ago by

SuperNintendoChalmers

New activity in mradermacher/model_requests about 1 month ago

motexture's cData models

#654 opened about 1 month ago by

SuperNintendoChalmers

liked a dataset about 1 month ago

motexture/cData

Viewer • Updated Nov 6, 2024 • 1k • 82 • 2

New activity in bartowski/phi-4-GGUF about 1 month ago

Regenerate quants to include bufixes?

#5 opened about 1 month ago by

SuperNintendoChalmers

New activity in bartowski/OLMo-2-1124-13B-Instruct-GGUF about 1 month ago

Re-generate quants for new model with same name

#3 opened about 1 month ago by

SuperNintendoChalmers

New activity in microsoft/phi-4 about 1 month ago

Request to fix bugs quicker next time

#37 opened about 1 month ago by

SuperNintendoChalmers

New activity in mradermacher/model_requests 2 months ago

nvidia/Hymba-1.5B-Instruct

#557 opened 2 months ago by

SuperNintendoChalmers

liked 2 datasets 3 months ago

RJ1200/Programming_C

Viewer • Updated Oct 14, 2024 • 5.58k • 8 • 1

bigcode/programming-languages-keywords

Viewer • Updated Mar 6, 2023 • 36 • 99 • 7

New activity in k-mktr/gpu-poor-llm-arena 3 months ago

Granite 3 MoE 1B returns all 4's

#2 opened 5 months ago by

Falcon 3 10B broken?

#5 opened 3 months ago by

SuperNintendoChalmers

reacted to bartowski's post with 👍 3 months ago

Post

64756

Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

16 replies

·

liked a Space 3 months ago

GPU Poor LLM Arena

Compact LLM Battle Arena: Frugal AI Face-Off!