Can't run in llama.cpp, wrong tensor shape
Opened a bug here since I saw the same issue with my own quants:
https://github.com/ggml-org/llama.cpp/issues/12376
converts and quantizes no problem, but fails to run.
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
Hey @bartowski , Is this issue only for Q3's?
no it's for all sizes sadly!
BF16 also failed in the same way
I'll download Q8_0 to be extra sure, but I think it's safe to say it applies to all quants if it happens to BF16
Yup, Q8_0 breaks in the same way @amanrangapur
Yep can confirm! Interestingly HF is fine - I think GGUF isn't registering the K_norm size due to grouped query attention
I'm assuming llama.cpp assumed K norm and Q norm to be off the same shape maybe? Ie Q/K norm cannot be used with GQA but unsure
load_tensors: layer 64 assigned to device CUDA0, is_swa = 0
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv load_model: failed to load model, '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
I think I have got same problem.
🥲 Failed to load the model
Failed to load model
error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
Same here
Fixed in llama.cpp#12400
Fixed in llama.cpp#12400
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
llama_model_load_from_file_impl: failed to load model
terminate called without an active exception
Aborted
still cannot run
still cannot run
Did you apply the PR? It's not merged yet.
Unfortunately still some issues
While it's able to now inference, the imatrix ends up with some NANs I think:
blk.42.attn_k.weight - [ 5120, 1024, 1, 1], type = bf16, converting to q4_K .. ggml_validate_row_data: found nan value at block 40
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 40
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 20
ggml_validate_row_data: found nan value at block 40
llama_model_quantize: failed to quantize: quantized data validation failed
Will have to post another bug for that :') but that's for the fix of the main issue!