Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[User] “'token_embd.weight' has wrong shape” when loading WizardLM-Uncensored-Falcon-40b #2894

Closed
rlanday opened this issue Aug 30, 2023 · 18 comments
Labels

Comments

@rlanday
Copy link
Contributor

rlanday commented Aug 30, 2023

Expected Behavior

I have been trying to run my favorite model, WizardLM-Uncensored-Falcon-40b, in llama.cpp, now that it has Falcon support (I have been running it in ggllm.cpp). I expected that, being a derivative of a standard Falcon model, this model should now work in llama.cpp.

Link to the model:
https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b

Current Behavior

Please provide a detailed written description of what llama.cpp did, instead.

I have tried multiple times (on different revisions) to convert the model to gguf format using the the latest code available:

python convert-falcon-hf-to-gguf.py /Volumes/Storage/ML\ models/WizardLM-Uncensored-Falcon-40b/ 1

This script runs successfully. However, every time I try to run the resulting model (or a quantized version thereof), I get this error:

error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 8192, 65024, got 8192, 65025, 1, 1

Apparently there is one extra token (padding?) in the embedding table that llama.cpp is not expecting.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

rlanday@Ryans-MBP-2 llama.cpp % sysctl -a | grep machdep.cpu machdep.cpu.mwait.linesize_min: 64 machdep.cpu.mwait.linesize_max: 64 machdep.cpu.mwait.extensions: 3 machdep.cpu.mwait.sub_Cstates: 286531872 machdep.cpu.thermal.sensor: 1 machdep.cpu.thermal.dynamic_acceleration: 1 machdep.cpu.thermal.invariant_APIC_timer: 1 machdep.cpu.thermal.thresholds: 2 machdep.cpu.thermal.ACNT_MCNT: 1 machdep.cpu.thermal.core_power_limits: 1 machdep.cpu.thermal.fine_grain_clock_mod: 1 machdep.cpu.thermal.package_thermal_intr: 1 machdep.cpu.thermal.hardware_feedback: 0 machdep.cpu.thermal.energy_policy: 1 machdep.cpu.xsave.extended_state: 31 832 1088 0 machdep.cpu.xsave.extended_state1: 15 832 256 0 machdep.cpu.arch_perf.version: 4 machdep.cpu.arch_perf.number: 4 machdep.cpu.arch_perf.width: 48 machdep.cpu.arch_perf.events_number: 7 machdep.cpu.arch_perf.events: 0 machdep.cpu.arch_perf.fixed_number: 3 machdep.cpu.arch_perf.fixed_width: 48 machdep.cpu.cache.linesize: 64 machdep.cpu.cache.L2_associativity: 4 machdep.cpu.cache.size: 256 machdep.cpu.tlb.inst.large: 8 machdep.cpu.tlb.data.small: 64 machdep.cpu.tlb.data.small_level1: 64 machdep.cpu.address_bits.physical: 39 machdep.cpu.address_bits.virtual: 48 machdep.cpu.tsc_ccc.numerator: 192 machdep.cpu.tsc_ccc.denominator: 2 machdep.cpu.max_basic: 22 machdep.cpu.max_ext: 2147483656 machdep.cpu.vendor: GenuineIntel machdep.cpu.brand_string: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz machdep.cpu.family: 6 machdep.cpu.model: 158 machdep.cpu.extmodel: 9 machdep.cpu.extfamily: 0 machdep.cpu.stepping: 13 machdep.cpu.feature_bits: 9221960262849657855 machdep.cpu.leaf7_feature_bits: 43804591 1073741824 machdep.cpu.leaf7_feature_bits_edx: 3154120192 machdep.cpu.extfeature_bits: 1241984796928 machdep.cpu.signature: 591597 machdep.cpu.brand: 0 machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 SMEP BMI2 ERMS INVPCID FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT SGXLC MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI machdep.cpu.logical_per_package: 16 machdep.cpu.cores_per_package: 8 machdep.cpu.microcode_version: 248 machdep.cpu.processor_flag: 5 machdep.cpu.core_count: 8 machdep.cpu.thread_count: 16

  • Operating System, e.g. for Linux:

rlanday@Ryans-MBP-2 llama.cpp % uname -a Darwin Ryans-MacBook-Pro-2.local 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:22 PDT 2023; root:xnu-8796.121.3~7/RELEASE_X86_64 x86_64

  • SDK version, e.g. for Linux:
rlanday@Ryans-MBP-2 llama.cpp % python3 --version
Python 3.11.4

rlanday@Ryans-MBP-2 llama.cpp % make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

rlanday@Ryans-MBP-2 llama.cpp % g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: x86_64-apple-darwin22.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. Download the model at https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b
  2. Convert it to gguf format using python convert-falcon-hf-to-gguf.py 1
  3. Attempt to run inference using the model, e.g.
./main --ctx_size 16384 -m <path to model>  --top_p 0 --top_k 40 --temp 0.7 --repeat_penalty 1.176470588235294 -t 8 -n -1 --repeat_last_n 256 -p "Please come up with a plan to fix San Francisco.

"

Failure Logs

rlanday@Ryans-MBP-2 llama.cpp % ./main --ctx_size 16384 -m /Volumes/Storage/ML\ models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf  --top_p 0 --top_k 40 --temp 0.7 --repeat_penalty 1.176470588235294 -t 8 -n -1 --repeat_last_n 256 -p "Please come up with a plan to fix San Francisco.

lscpu"
main: warning: base model only supports context sizes no greater than 2048 tokens (16384 specified)
main: build = 1119 (06abf8e)
main: seed  = 1693374095
llama_model_loader: loaded meta data with 18 key-value pairs and 484 tensors from /Volumes/Storage/ML models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf (version GGUF V1 (support until nov 2023))
llama_model_loader: - tensor    0:                token_embd.weight f16      [  8192, 65025,     1,     1 ]
llama_model_loader: - tensor    1:         blk.0.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor    2:           blk.0.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor    3:           blk.0.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor    4:             blk.0.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor    7:              blk.0.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor    8:            blk.0.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor    9:         blk.1.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   10:           blk.1.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   11:           blk.1.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   12:             blk.1.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   13:            blk.1.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   14:         blk.1.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   15:              blk.1.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   16:            blk.1.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   17:         blk.2.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   18:           blk.2.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   19:           blk.2.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   20:             blk.2.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   21:            blk.2.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   22:         blk.2.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   23:              blk.2.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   24:            blk.2.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   25:         blk.3.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   26:           blk.3.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   27:           blk.3.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   28:             blk.3.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   29:            blk.3.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   30:         blk.3.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   31:              blk.3.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   32:            blk.3.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   33:         blk.4.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   34:           blk.4.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   35:           blk.4.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   36:             blk.4.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   37:            blk.4.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   38:         blk.4.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   39:              blk.4.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   40:            blk.4.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   41:         blk.5.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   42:           blk.5.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   43:           blk.5.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   44:             blk.5.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   45:            blk.5.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   46:         blk.5.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   47:              blk.5.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   48:            blk.5.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   49:         blk.6.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   50:           blk.6.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   51:           blk.6.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   52:             blk.6.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   53:            blk.6.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   54:         blk.6.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   55:              blk.6.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   56:            blk.6.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   57:         blk.7.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   58:           blk.7.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   59:           blk.7.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   60:             blk.7.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   61:            blk.7.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   62:         blk.7.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   63:              blk.7.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   64:            blk.7.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   65:         blk.8.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   66:           blk.8.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   67:           blk.8.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   68:             blk.8.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   69:            blk.8.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   70:         blk.8.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   71:              blk.8.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   72:            blk.8.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   73:         blk.9.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   74:           blk.9.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   75:           blk.9.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   76:             blk.9.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   77:            blk.9.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   78:         blk.9.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   79:              blk.9.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   80:            blk.9.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   81:        blk.10.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   82:          blk.10.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   83:          blk.10.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   84:            blk.10.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   85:           blk.10.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   86:        blk.10.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   87:             blk.10.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   88:           blk.10.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   89:        blk.11.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   90:          blk.11.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   91:          blk.11.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   92:            blk.11.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   93:           blk.11.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor   94:        blk.11.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   95:             blk.11.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor   96:           blk.11.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor   97:        blk.12.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   98:          blk.12.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   99:          blk.12.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  100:            blk.12.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  101:           blk.12.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  102:        blk.12.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  103:             blk.12.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  104:           blk.12.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  105:        blk.13.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  106:          blk.13.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  107:          blk.13.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  108:            blk.13.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  109:           blk.13.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  110:        blk.13.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  111:             blk.13.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  112:           blk.13.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  113:        blk.14.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  114:          blk.14.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  115:          blk.14.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  116:            blk.14.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  117:           blk.14.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  118:        blk.14.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  119:             blk.14.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  120:           blk.14.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  121:        blk.15.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  122:          blk.15.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  123:          blk.15.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  124:            blk.15.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  125:           blk.15.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  126:        blk.15.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  127:             blk.15.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  128:           blk.15.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  129:        blk.16.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  130:          blk.16.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  131:          blk.16.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  132:            blk.16.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  133:           blk.16.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  134:        blk.16.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  135:             blk.16.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  136:           blk.16.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  137:        blk.17.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  138:          blk.17.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  139:          blk.17.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  140:            blk.17.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  141:           blk.17.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  142:        blk.17.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  143:             blk.17.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  144:           blk.17.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  145:        blk.18.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  146:          blk.18.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  147:          blk.18.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  148:            blk.18.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  149:           blk.18.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  150:        blk.18.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  151:             blk.18.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  152:           blk.18.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  153:        blk.19.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  154:          blk.19.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  155:          blk.19.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  156:            blk.19.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  157:           blk.19.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  158:        blk.19.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  159:             blk.19.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  160:           blk.19.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  161:        blk.20.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  162:          blk.20.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  163:          blk.20.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  164:            blk.20.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  165:           blk.20.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  166:        blk.20.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  167:             blk.20.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  168:           blk.20.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  169:        blk.21.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  170:          blk.21.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  171:          blk.21.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  172:            blk.21.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  173:           blk.21.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  174:        blk.21.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  175:             blk.21.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  176:           blk.21.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  177:        blk.22.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  178:          blk.22.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  179:          blk.22.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  180:            blk.22.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  181:           blk.22.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  182:        blk.22.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  183:             blk.22.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  184:           blk.22.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  185:        blk.23.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  186:          blk.23.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  187:          blk.23.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  188:            blk.23.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  189:           blk.23.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  190:        blk.23.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  191:             blk.23.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  192:           blk.23.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  193:        blk.24.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  194:          blk.24.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  195:          blk.24.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  196:            blk.24.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  197:           blk.24.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  198:        blk.24.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  199:             blk.24.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  200:           blk.24.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  201:        blk.25.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  202:          blk.25.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  203:          blk.25.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  204:            blk.25.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  205:           blk.25.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  206:        blk.25.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  207:             blk.25.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  208:           blk.25.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  209:        blk.26.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  210:          blk.26.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  211:          blk.26.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  212:            blk.26.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  213:           blk.26.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  214:        blk.26.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  215:             blk.26.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  216:           blk.26.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  217:        blk.27.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  218:          blk.27.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  219:          blk.27.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  220:            blk.27.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  221:           blk.27.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  222:        blk.27.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  223:             blk.27.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  224:           blk.27.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  225:        blk.28.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  226:          blk.28.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  227:          blk.28.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  228:            blk.28.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  229:           blk.28.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  230:        blk.28.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  231:             blk.28.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  232:           blk.28.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  233:        blk.29.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  234:          blk.29.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  235:          blk.29.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  236:            blk.29.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  237:           blk.29.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  238:        blk.29.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  239:             blk.29.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  240:           blk.29.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  241:        blk.30.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  242:          blk.30.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  243:          blk.30.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  244:            blk.30.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  245:           blk.30.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  246:        blk.30.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  247:             blk.30.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  248:           blk.30.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  249:        blk.31.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  250:          blk.31.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  251:          blk.31.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  252:            blk.31.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  253:           blk.31.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  254:        blk.31.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  255:             blk.31.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  256:           blk.31.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  257:        blk.32.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  258:          blk.32.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  259:          blk.32.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  260:            blk.32.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  261:           blk.32.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  262:        blk.32.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  263:             blk.32.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  264:           blk.32.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  265:        blk.33.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  266:          blk.33.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  267:          blk.33.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  268:            blk.33.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  269:           blk.33.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  270:        blk.33.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  271:             blk.33.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  272:           blk.33.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  273:        blk.34.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  274:          blk.34.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  275:          blk.34.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  276:            blk.34.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  277:           blk.34.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  278:        blk.34.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  279:             blk.34.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  280:           blk.34.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  281:        blk.35.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  282:          blk.35.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  283:          blk.35.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  284:            blk.35.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  285:           blk.35.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  286:        blk.35.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  287:             blk.35.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  288:           blk.35.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  289:        blk.36.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  290:          blk.36.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  291:          blk.36.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  292:            blk.36.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  293:           blk.36.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  294:        blk.36.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  295:             blk.36.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  296:           blk.36.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  297:        blk.37.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  298:          blk.37.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  299:          blk.37.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  300:            blk.37.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  301:           blk.37.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  302:        blk.37.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  303:             blk.37.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  304:           blk.37.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  305:        blk.38.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  306:          blk.38.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  307:          blk.38.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  308:            blk.38.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  309:           blk.38.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  310:        blk.38.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  311:             blk.38.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  312:           blk.38.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  313:        blk.39.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  314:          blk.39.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  315:          blk.39.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  316:            blk.39.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  317:           blk.39.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  318:        blk.39.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  319:             blk.39.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  320:           blk.39.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  321:        blk.40.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  322:          blk.40.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  323:          blk.40.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  324:            blk.40.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  325:           blk.40.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  326:        blk.40.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  327:             blk.40.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  328:           blk.40.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  329:        blk.41.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  330:          blk.41.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  331:          blk.41.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  332:            blk.41.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  333:           blk.41.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  334:        blk.41.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  335:             blk.41.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  336:           blk.41.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  337:        blk.42.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  338:          blk.42.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  339:          blk.42.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  340:            blk.42.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  341:           blk.42.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  342:        blk.42.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  343:             blk.42.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  344:           blk.42.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  345:        blk.43.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  346:          blk.43.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  347:          blk.43.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  348:            blk.43.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  349:           blk.43.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  350:        blk.43.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  351:             blk.43.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  352:           blk.43.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  353:        blk.44.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  354:          blk.44.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  355:          blk.44.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  356:            blk.44.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  357:           blk.44.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  358:        blk.44.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  359:             blk.44.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  360:           blk.44.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  361:        blk.45.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  362:          blk.45.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  363:          blk.45.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  364:            blk.45.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  365:           blk.45.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  366:        blk.45.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  367:             blk.45.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  368:           blk.45.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  369:        blk.46.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  370:          blk.46.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  371:          blk.46.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  372:            blk.46.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  373:           blk.46.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  374:        blk.46.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  375:             blk.46.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  376:           blk.46.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  377:        blk.47.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  378:          blk.47.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  379:          blk.47.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  380:            blk.47.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  381:           blk.47.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  382:        blk.47.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  383:             blk.47.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  384:           blk.47.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  385:        blk.48.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  386:          blk.48.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  387:          blk.48.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  388:            blk.48.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  389:           blk.48.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  390:        blk.48.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  391:             blk.48.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  392:           blk.48.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  393:        blk.49.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  394:          blk.49.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  395:          blk.49.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  396:            blk.49.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  397:           blk.49.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  398:        blk.49.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  399:             blk.49.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  400:           blk.49.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  401:        blk.50.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  402:          blk.50.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  403:          blk.50.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  404:            blk.50.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  405:           blk.50.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  406:        blk.50.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  407:             blk.50.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  408:           blk.50.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  409:        blk.51.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  410:          blk.51.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  411:          blk.51.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  412:            blk.51.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  413:           blk.51.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  414:        blk.51.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  415:             blk.51.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  416:           blk.51.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  417:        blk.52.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  418:          blk.52.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  419:          blk.52.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  420:            blk.52.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  421:           blk.52.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  422:        blk.52.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  423:             blk.52.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  424:           blk.52.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  425:        blk.53.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  426:          blk.53.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  427:          blk.53.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  428:            blk.53.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  429:           blk.53.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  430:        blk.53.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  431:             blk.53.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  432:           blk.53.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  433:        blk.54.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  434:          blk.54.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  435:          blk.54.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  436:            blk.54.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  437:           blk.54.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  438:        blk.54.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  439:             blk.54.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  440:           blk.54.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  441:        blk.55.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  442:          blk.55.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  443:          blk.55.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  444:            blk.55.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  445:           blk.55.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  446:        blk.55.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  447:             blk.55.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  448:           blk.55.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  449:        blk.56.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  450:          blk.56.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  451:          blk.56.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  452:            blk.56.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  453:           blk.56.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  454:        blk.56.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  455:             blk.56.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  456:           blk.56.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  457:        blk.57.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  458:          blk.57.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  459:          blk.57.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  460:            blk.57.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  461:           blk.57.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  462:        blk.57.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  463:             blk.57.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  464:           blk.57.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  465:        blk.58.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  466:          blk.58.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  467:          blk.58.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  468:            blk.58.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  469:           blk.58.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  470:        blk.58.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  471:             blk.58.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  472:           blk.58.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  473:        blk.59.attn_norm_2.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  474:          blk.59.attn_norm_2.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  475:          blk.59.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  476:            blk.59.attn_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  477:           blk.59.attn_qkv.weight f16      [  8192,  9216,     1,     1 ]
llama_model_loader: - tensor  478:        blk.59.attn_output.weight f16      [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  479:             blk.59.ffn_up.weight f16      [  8192, 32768,     1,     1 ]
llama_model_loader: - tensor  480:           blk.59.ffn_down.weight f16      [ 32768,  8192,     1,     1 ]
llama_model_loader: - tensor  481:               output_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  482:                 output_norm.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  483:                    output.weight f16      [  8192, 65025,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str     
llama_model_loader: - kv   1:                               general.name str     
llama_model_loader: - kv   2:                      falcon.context_length u32     
llama_model_loader: - kv   3:                  falcon.tensor_data_layout str     
llama_model_loader: - kv   4:                    falcon.embedding_length u32     
llama_model_loader: - kv   5:                 falcon.feed_forward_length u32     
llama_model_loader: - kv   6:                         falcon.block_count u32     
llama_model_loader: - kv   7:                falcon.attention.head_count u32     
llama_model_loader: - kv   8:             falcon.attention.head_count_kv u32     
llama_model_loader: - kv   9:        falcon.attention.layer_norm_epsilon f32     
llama_model_loader: - kv  10:                          general.file_type u32     
llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
llama_model_loader: - kv  12:                      tokenizer.ggml.merges arr     
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32     
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32     
llama_model_loader: - type  f32:  242 tensors
llama_model_loader: - type  f16:  242 tensors
llm_load_print_meta: format         = GGUF V1 (support until nov 2023)
llm_load_print_meta: arch           = falcon
llm_load_print_meta: vocab type     = BPE
llm_load_print_meta: n_vocab        = 65024
llm_load_print_meta: n_merges       = 64784
llm_load_print_meta: n_ctx_train    = 2048
llm_load_print_meta: n_ctx          = 16384
llm_load_print_meta: n_embd         = 8192
llm_load_print_meta: n_head         = 128
llm_load_print_meta: n_head_kv      = 8
llm_load_print_meta: n_layer        = 60
llm_load_print_meta: n_rot          = 64
llm_load_print_meta: n_gqa          = 16
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 32768
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 40B
llm_load_print_meta: model ftype    = mostly F16
llm_load_print_meta: model size     = 41.84 B
llm_load_print_meta: general.name   = Falcon
llm_load_print_meta: BOS token = 1 '>>ABSTRACT<<'
llm_load_print_meta: EOS token = 2 '>>INTRODUCTION<<'
llm_load_print_meta: LF token  = 193 '
'
llm_load_tensors: ggml ctx size =    0.16 MB
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  8192, 65024, got  8192, 65025,     1,     1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/Volumes/Storage/ML models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf'
main: error: unable to load model

Additional environment info:

rlanday@Ryans-MBP-2 llama.cpp % git log | head -1

commit 06abf8eebabe086ca4003dee2754ab45032cd3fd

rlanday@Ryans-MBP-2 llama.cpp % pip list | egrep "torch|numpy|sentencepiece"
numpy              1.24.0
sentencepiece      0.1.98
torch              2.0.1
@KerfuffleV2
Copy link
Collaborator

Not related to your current issue but

llm_load_print_meta: BOS token = 1 '>>ABSTRACT<<'
llm_load_print_meta: EOS token = 2 '>>INTRODUCTION<<'

is definitely wrong. In the base model it should be token id 11 for both (<|endoftext|>).

@akawrykow
Copy link
Contributor

If you change this line:

vocab_size = len(tokenizer_json["model"]["vocab"])

to:

vocab_size = hparams["vocab_size"]

then re-convert, does it work?

I have noticed in other Falcon models (e.g falcon-rw-1b) that the stated vocab size from the config doesn't match the number of tokens in tokenizer.json. The script already pads with arbitrary tokens -- in this case, only a single extra token is added.

@akawrykow
Copy link
Contributor

@KerfuffleV2 FWIW, the config specifies token IDs 1 and 2: https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b/blob/main/config.json#L14-L15

which do map to '>>ABSTRACT<<' and '>>INTRODUCTION<<': https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b/raw/main/tokenizer.json

although there is an additional tokenizer_config.json which seems to override EOS: https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b/blob/main/tokenizer_config.json#L4 - I don't think we're accounting for this in the script. But I couldn't find anything similar for BOS

@akawrykow
Copy link
Contributor

I also noticed that tokenizer_config.json specifies:

  "padding": {
    "strategy": "BatchLongest",
    "direction": "Right",
    "pad_to_multiple_of": null,
    "pad_id": 65024,
    "pad_type_id": 0,
    "pad_token": "[PAD]"
  },

this extra token would indeed account for the one missing token, and we don't account for this in the script, although our existing padding inserts [PAD0] so, close enough?

@KerfuffleV2
Copy link
Collaborator

FWIW, the config specifies token IDs 1 and 2

although there is an additional tokenizer_config.json which seems to override EOS

Weird. For extracting special token IDs like BOS/EOS, tokenizer.json and tokenizer_config.json get tried first and config.json is tried as a fallback.

@akawrykow
Copy link
Contributor

@KerfuffleV2 actually it seems like it should work now after #2842.

It seems like previously, we just used the IDs from config.json but now we have the extra step of looking at the added_tokens of the tokenizer_config.json, and for the model in question, the eos token is there at least, so it should get mapped back to the correct type.

I wonder if you publish the updated module to pip, install that, then run the convert script if this would work

@akawrykow
Copy link
Contributor

Yes, confirmed after upgrading the gguf package with an extra print:

gguf: Setting special token type eos to 11

@KerfuffleV2 @ggerganov does it make sense to fallback to BOS = EOS when we have a 'special' EOS token? Is that a convention that these models are following implicitly?

@KerfuffleV2
Copy link
Collaborator

does it make sense to fallback to BOS = EOS when we have a 'special' EOS token?

Unfortunately, I don't know enough to answer that question. It sounds kind of reasonable, but it probably really depends on how the model is trained.

I wonder if you publish the updated module to pip

I don't have that capability (but I should have done a better job of making sure that happened in sync with my changes). Hopefully #2916 fixed your issue issues. Sorry about the breakage!

@rlanday
Copy link
Contributor Author

rlanday commented Aug 31, 2023

I updated to the latest gguf and revision 92d0b75 and verified that llama.cpp now produces this output when loading the converted model:

llm_load_print_meta: general.name   = Falcon
llm_load_print_meta: BOS token = 11 '<|endoftext|>'
llm_load_print_meta: EOS token = 11 '<|endoftext|>'
llm_load_print_meta: LF token  = 193 '
'
llm_load_tensors: ggml ctx size =    0.16 MB
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  8192, 65024, got  8192, 65025,     1,     1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/Volumes/Storage/ML models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf'
main: error: unable to load model

I then applied the change in #2894 (comment) and reconverted the model, and was able to get it working (the model loads and produces coherent output).

@Philipp-Sc
Copy link

I am having a very similar issue, but I use convert-llama-hf-to-gguf.py

llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32001
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 2048
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 4096
llm_load_print_meta: n_head         = 32
llm_load_print_meta: n_head_kv      = 32
llm_load_print_meta: n_layer        = 32
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 11008
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 7B
llm_load_print_meta: model ftype    = mostly F16 (guessed)
llm_load_print_meta: model size     = 6.74 B
llm_load_print_meta: general.name   = merged_adapters_11300
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.09 MB
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  4096, 32001, got  4096, 32000,     1,     1

there is no obvious equivalent to vocab_size = hparams["vocab_size"] in convert-llama-hf-to-gguf.py

is there a fix required for convert-llama-hf-to-gguf.py? (If not it's probably a configuration mistake on my part)

@KerfuffleV2
Copy link
Collaborator

is there a fix required for convert-llama-hf-to-gguf.py?

I think it just doesn't work currently. Try using the main convert.py script. There's a pull request to remove those convert-llama scripts since apparently they are non-functional.

@akawrykow
Copy link
Contributor

I updated to the latest gguf and revision 92d0b75 and verified that llama.cpp now produces this output when loading the converted model:

llm_load_print_meta: general.name   = Falcon
llm_load_print_meta: BOS token = 11 '<|endoftext|>'
llm_load_print_meta: EOS token = 11 '<|endoftext|>'
llm_load_print_meta: LF token  = 193 '
'
llm_load_tensors: ggml ctx size =    0.16 MB
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  8192, 65024, got  8192, 65025,     1,     1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/Volumes/Storage/ML models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf'
main: error: unable to load model

I then applied the change in #2894 (comment) and reconverted the model, and was able to get it working (the model loads and produces coherent output).

cc @ggerganov shall we merge #2914 ?

@Philipp-Sc
Copy link

@KerfuffleV2 convert.py fails for another reason, any idea what this is about?

ubuntu@host:~/llama.cpp$
python3 convert.py ../merged_adapters_11300/
Traceback (most recent call last):
  File "convert.py", line 533, in <module>
    LazyModel = dict[str, LazyTensor]
TypeError: 'type' object is not subscriptable

ubuntu@host:~/llama.cpp$ ls ../merged_adapters_11300
added_tokens.json  generation_config.json            pytorch_model-00002-of-00002.bin  special_tokens_map.json  tokenizer.model
config.json        pytorch_model-00001-of-00002.bin  pytorch_model.bin.index.json      tokenizer.json           tokenizer_config.json

thanks in advance.

@KerfuffleV2
Copy link
Collaborator

convert.py fails for another reason, any idea what this is about?

Uhhh, actually can't blame me for that one! Looks like @cebtenzzre changed it from Dict to dict in #2916. If you add Dict to the list of imports from typing near the top and change that line to use Dict rather than dict does it work?

Or maybe simpler as a quick fix, I think you can just make it LazyModel = dict (typechecking tools won't like this but it shouldn't have a runtime impact).

@Philipp-Sc
Copy link

@KerfuffleV2 thanks for your quick reply I found the issue, I had to update Python to version 3.9

Everything works now :)

@cebtenzzre
Copy link
Collaborator

I think you can just make it LazyModel = dict

You can actually remove that line entirely if you just want it to run, it's only used by the type checker.

Fixed in PR #2949.

Copy link
Contributor

github-actions bot commented Apr 5, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 5, 2024
@Jingy2000
Copy link

I found a related discussion here that might be helpful:

https://huggingface.co/TheBloke/CodeLlama-7B-Python-GGUF/discussions/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants