Can't run in llama.cpp, wrong tensor shape

by bartowski - opened 4 days ago

4 days ago

Opened a bug here since I saw the same issue with my own quants:

https://github.com/ggml-org/llama.cpp/issues/12376

converts and quantizes no problem, but fails to run.

llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1

amanrangapur

Ai2 org 4 days ago

Hey @bartowski , Is this issue only for Q3's?

bartowski

4 days ago

no it's for all sizes sadly!

BF16 also failed in the same way

bartowski

4 days ago

I'll download Q8_0 to be extra sure, but I think it's safe to say it applies to all quants if it happens to BF16

bartowski

4 days ago

Yup, Q8_0 breaks in the same way @amanrangapur

danielhanchen

4 days ago

Yep can confirm! Interestingly HF is fine - I think GGUF isn't registering the K_norm size due to grouped query attention

danielhanchen

4 days ago

I'm assuming llama.cpp assumed K norm and Q norm to be off the same shape maybe? Ie Q/K norm cannot be used with GQA but unsure

JLouisBiz

3 days ago

load_tensors: layer  64 assigned to device CUDA0, is_swa = 0
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected  5120, got  1024,     1,     1,     1
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv    load_model: failed to load model, '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

I think I have got same problem.

wapxmas

3 days ago

🥲 Failed to load the model

Failed to load model

error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected  5120, got  1024,     1,     1,     1

Same here

CISCai

2 days ago

Fixed in llama.cpp#12400

JLouisBiz

2 days ago

Fixed in llama.cpp#12400

load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected  5120, got  1024,     1,     1,     1
llama_model_load_from_file_impl: failed to load model
terminate called without an active exception
Aborted

still cannot run

CISCai

1 day ago

still cannot run

Did you apply the PR? It's not merged yet.

CISCai

about 24 hours ago

Merged, will be in b4896 when done.

bartowski

about 7 hours ago

Unfortunately still some issues

While it's able to now inference, the imatrix ends up with some NANs I think:

blk.42.attn_k.weight - [ 5120,  1024,     1,     1], type =   bf16, converting to q4_K .. ggml_validate_row_data: found nan value at block 40
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 40
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 40
llama_model_quantize: failed to quantize: quantized data validation failed

Will have to post another bug for that :') but that's for the fix of the main issue!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment