Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“'token_embd.weight' has wrong shape” when loading deepseek-coder-1.3b-base.Q8_0.gguf #5910

Closed
ashiqguntupalli opened this issue Mar 6, 2024 · 9 comments

Comments

@ashiqguntupalli
Copy link

ashiqguntupalli commented Mar 6, 2024

OS: ubuntu 22.04.1
GPU: Nvidia 3060
llama.cpp version: 2350

I have dowloaded the deepseek-coder-1.3b-base from huggingface and converted from huggingface to gguf format by using convert.py script. Initially while converting i had a problem related to vocab size Vocab size mismatch (model has 31999, but deepseekai_deepseekcoder_1p3_hf has 32022). After debugging i came to know there are some additional vocab and they are not considered in vocab size later by changing the vocab size from 31999 to 32022 in the config.json converty.py worked as expected.

After the conversion, Loaded the model by using the command ./main -m deepseekcoder_1p3_q8_0.gguf -n 128and that generated error llama_model_load: error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 2048, 32022, got 2048, 32256, 1, 1. For more information please have a look in the below log

Log start
main: build = 2350 (bd836944)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1709755816
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 3060'
llama_model_loader: loaded meta data with 24 key-value pairs and 219 tensors from ../../download_huggingface/deepseekai_deepseekcoder_1p3_hf/ggml-model-q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = download_huggingface
llama_model_loader: - kv   2:                       llama.context_length u32              = 16384
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 24
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5504
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 16
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 16
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0,000001
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 100000,000000
llama_model_loader: - kv  11:                    llama.rope.scaling.type str              = linear
llama_model_loader: - kv  12:                  llama.rope.scaling.factor f32              = 4,000000
llama_model_loader: - kv  13:                          general.file_type u32              = 7
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32022]   = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32022]   = [-1000,000000, -1000,000000, -1000,00...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32022]   = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 32013
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 32021
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 32014
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
llama_model_loader: - type  f32:   49 tensors
llama_model_loader: - type q8_0:  170 tensors
llm_load_vocab: SPM vocabulary, but newline token not found: _Map_base::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 9/32022 vs 22/32022 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32022
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 16384
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0,0e+00
llm_load_print_meta: f_norm_rms_eps   = 1,0e-06
llm_load_print_meta: f_clamp_kqv      = 0,0e+00
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
llm_load_print_meta: n_ff             = 5504
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 100000,0
llm_load_print_meta: freq_scale_train = 0,25
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 1,35 B
llm_load_print_meta: model size       = 1,33 GiB (8,50 BPW) 
llm_load_print_meta: general.name     = download_huggingface
llm_load_print_meta: BOS token        = 32013 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token        = 32021 '<|EOT|>'
llm_load_print_meta: UNK token        = 0 '!'
llm_load_print_meta: PAD token        = 32014 '<|end▁of▁sentence|>'
llm_load_tensors: ggml ctx size =    0,08 MiB
llama_model_load: error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  2048, 32022, got  2048, 32256,     1,     1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '../../download_huggingface/deepseekai_deepseekcoder_1p3_hf/ggml-model-q8_0.gguf'
main: error: unable to load model

I guess in my case conversion might have broke the model. I have seen a similar issue in #2894 but with the script convert-hf-to-gguf.py. So is this issue already known or is it only happening to me?

FYI, I have tested my llama.cpp build with the GGUF model from the bloke and that worked like a charm.

@ashiqguntupalli
Copy link
Author

I am still struggling with the same error, Does any one have same or similar issue?

@FotieMConstant
Copy link

i have a similar issue with a fine-tuned of Llama2-7b

llama_model_load: error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 4096, 32001, got 4096, 32000,

been on this for weeks too:(

@dozzky
Copy link

dozzky commented Mar 18, 2024

I am still struggling with the same error, Does any one have same or similar issue?

any updates? Having similar issue with merged model (via mergekit-moe, codellama and sambalingo).

[1710765483] llama_model_load: error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 4096, 32004, got 4096, 32000, 1, 1

Also changed vocab_size manually before converting (from 32000 to 32004)

@dozzky
Copy link

dozzky commented Mar 19, 2024

deleted added_token.json file and manually changed vocab_size in config.json, so it can work without --pad-vocab (before converting), worked for me, seems like that additional parameters were counted cuz of that

@ashiqguntupalli
Copy link
Author

@dozzky Codellama works without any changes on my side but the problem with the model deepseek-coder-1.3b-basestill persists.

@dozzky
Copy link

dozzky commented Mar 20, 2024

@ashiqguntupalli Can you make a screenshot of a model folder before converting it to gguf? Also your token_embd.weight expects 32256 (not 31999 that you previously changed to 32022), it's so strange seeing extra 200+. In my case added_token.json file had 4 lines (exactly the number that i needed to remove before converting).

@ashiqguntupalli
Copy link
Author

@dozzky i have below configuration json files

  1. config.json
  2. generation_config.json
  3. tokenizer.json
  4. tokenizer_config.json

I have changed the vocab_size from 32256 to 32022 in config.json but still error persists.

@liqinga
Copy link

liqinga commented Mar 27, 2024

I have the same issue

@github-actions github-actions bot added the stale label Apr 27, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants