Can't run in llama.cpp, wrong tensor shape

#1
by bartowski - opened

Opened a bug here since I saw the same issue with my own quants:

https://github.com/ggml-org/llama.cpp/issues/12376

converts and quantizes no problem, but fails to run.

llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1

Hey @bartowski , Is this issue only for Q3's?

no it's for all sizes sadly!

BF16 also failed in the same way

I'll download Q8_0 to be extra sure, but I think it's safe to say it applies to all quants if it happens to BF16

Yup, Q8_0 breaks in the same way @amanrangapur

Yep can confirm! Interestingly HF is fine - I think GGUF isn't registering the K_norm size due to grouped query attention

I'm assuming llama.cpp assumed K norm and Q norm to be off the same shape maybe? Ie Q/K norm cannot be used with GQA but unsure

load_tensors: layer  64 assigned to device CUDA0, is_swa = 0
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected  5120, got  1024,     1,     1,     1
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv    load_model: failed to load model, '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

I think I have got same problem.

🥲 Failed to load the model

Failed to load model

error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected  5120, got  1024,     1,     1,     1

Same here

Fixed in llama.cpp#12400

Fixed in llama.cpp#12400

load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected  5120, got  1024,     1,     1,     1
llama_model_load_from_file_impl: failed to load model
terminate called without an active exception
Aborted

still cannot run

still cannot run

Did you apply the PR? It's not merged yet.

Merged, will be in b4896 when done.

Unfortunately still some issues

While it's able to now inference, the imatrix ends up with some NANs I think:

blk.42.attn_k.weight - [ 5120,  1024,     1,     1], type =   bf16, converting to q4_K .. ggml_validate_row_data: found nan value at block 40
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 40
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 40
llama_model_quantize: failed to quantize: quantized data validation failed

Will have to post another bug for that :') but that's for the fix of the main issue!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment