Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting kfkas Llama-2-ko-7b-Chat to GGUF fails #2865

Closed
kurugai opened this issue Aug 29, 2023 · 34 comments
Closed

Converting kfkas Llama-2-ko-7b-Chat to GGUF fails #2865

kurugai opened this issue Aug 29, 2023 · 34 comments

Comments

@kurugai
Copy link

kurugai commented Aug 29, 2023

Hi. I'm trying to convert the 'kfkas/Llama-2-ko-7b-Chat' model I received from huggingface on Windows 11 into a gguf file.
So I tried to convert it to the command below.

C:\AI\llama.cpp>python convert-llama-hf-to-gguf.py .\models\kfkas_Llama-2-ko-7b-Chat 1

The conversion was successful, but when I tried to execute it, there was a problem that it couldn't be executed.

Can I ask you to review what should I do? Below are the results of the command execution.

I know you're busy, but please do it once.


C:\AI\llama.cpp>pip install gguf
Defaulting to user installation because normal site-packages is not writeable
Collecting gguf
Obtaining dependency information for gguf from https://files.pythonhosted.org/packages/bb/16/83a1cb95d9ec85bc316a1986481325c257a4a9a024e12bace801898db14e/gguf-0.2.1-py3-none-any.whl.metadata
Downloading gguf-0.2.1-py3-none-any.whl.metadata (1.9 kB)
Requirement already satisfied: numpy>=1.17 in c:\users\hwyoo\appdata\roaming\python\python310\site-packages (from gguf) (1.23.5)
Downloading gguf-0.2.1-py3-none-any.whl (8.1 kB)
Installing collected packages: gguf
Successfully installed gguf-0.2.1

C:\AI\llama.cpp>python convert-llama-hf-to-gguf.py .\models\kfkas_Llama-2-ko-7b-Chat 1
gguf: loading model kfkas_Llama-2-ko-7b-Chat
gguf: found 2 model parts
gguf: get model metadata
gguf: get tokenizer metadata
gguf: get special token ids
gguf: get tensor metadata
gguf: loading model part 'pytorch_model-00001-of-00002.bin'
token_embd.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.0.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.2.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.2.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.3.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.3.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.4.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.4.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.5.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.5.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.6.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.6.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.7.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.7.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.8.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.8.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.9.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.9.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.10.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.10.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.11.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.11.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.12.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.12.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.13.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.13.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.14.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.14.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.15.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.15.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.16.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.16.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.17.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.17.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.18.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.18.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.19.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.19.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.20.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.20.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.21.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.21.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.22.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.22.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.23.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
gguf: loading model part 'pytorch_model-00002-of-00002.bin'
blk.23.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.23.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.24.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.24.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.24.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.24.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.24.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.24.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.24.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.24.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.24.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.25.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.25.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.25.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.25.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.25.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.25.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.25.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.25.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.25.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.26.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.26.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.26.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.26.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.26.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.26.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.26.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.26.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.26.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.27.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.27.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.27.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.27.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.27.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.27.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.27.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.27.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.27.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.28.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.28.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.28.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.28.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.28.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.28.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.28.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.28.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.28.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.29.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.29.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.29.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.29.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.29.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.29.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.29.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.29.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.29.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.30.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.30.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.30.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.30.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.30.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.30.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.30.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.30.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.30.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.31.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.31.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.31.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.31.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.31.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.31.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.31.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.31.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.31.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
output_norm.weight, n_dims = 1, torch.float16 --> float32
output.weight, n_dims = 2, torch.float16 --> float16
gguf: write header
gguf: write metadata
gguf: write tensors
gguf: model successfully exported to '.\models\kfkas_Llama-2-ko-7b-Chat/ggml-model-f16.gguf'

C:\AI\llama.cpp>main
main: build = 1100 (dd0dc36)
main: seed = 1693289567
llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from models/7B/ggml-model-f16.gguf (version GGUF V1L����.llama_model_loader: - tensor 0: token_embd.weight f16 [ 4096, 46336, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 10: blk.1.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 11: blk.1.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 19: blk.2.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 20: blk.2.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 28: blk.3.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 29: blk.3.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 37: blk.4.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 38: blk.4.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 44: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 45: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 46: blk.5.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 47: blk.5.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 48: blk.5.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 49: blk.5.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 50: blk.5.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 51: blk.5.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 52: blk.5.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 53: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 54: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 55: blk.6.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 56: blk.6.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 57: blk.6.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 58: blk.6.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 59: blk.6.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 60: blk.6.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 61: blk.6.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 62: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 63: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 64: blk.7.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 65: blk.7.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 66: blk.7.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 67: blk.7.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 68: blk.7.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 69: blk.7.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 70: blk.7.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 71: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 72: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 73: blk.8.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 74: blk.8.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 75: blk.8.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 76: blk.8.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 77: blk.8.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 78: blk.8.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 79: blk.8.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 80: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 81: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 82: blk.9.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 83: blk.9.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 84: blk.9.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 85: blk.9.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 86: blk.9.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 87: blk.9.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 88: blk.9.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 89: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 90: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 91: blk.10.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 92: blk.10.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 93: blk.10.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 94: blk.10.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 95: blk.10.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 96: blk.10.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 97: blk.10.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 98: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 99: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.11.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 101: blk.11.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 102: blk.11.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 103: blk.11.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 104: blk.11.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 105: blk.11.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 106: blk.11.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 107: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 108: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 109: blk.12.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 110: blk.12.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 111: blk.12.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 112: blk.12.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 113: blk.12.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 114: blk.12.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 115: blk.12.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 116: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 117: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 118: blk.13.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 119: blk.13.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 120: blk.13.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 121: blk.13.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 122: blk.13.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 123: blk.13.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 124: blk.13.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 125: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 126: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 127: blk.14.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 128: blk.14.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 129: blk.14.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 130: blk.14.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 131: blk.14.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 132: blk.14.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 133: blk.14.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 134: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 135: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 136: blk.15.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 137: blk.15.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 145: blk.16.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 146: blk.16.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 154: blk.17.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 155: blk.17.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 163: blk.18.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 164: blk.18.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 172: blk.19.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 173: blk.19.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 181: blk.20.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 182: blk.20.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 190: blk.21.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 191: blk.21.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 199: blk.22.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 200: blk.22.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 201: blk.22.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 202: blk.22.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 203: blk.22.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 204: blk.22.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 205: blk.22.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 206: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 207: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 208: blk.23.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 209: blk.23.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 210: blk.23.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 211: blk.23.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 212: blk.23.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 213: blk.23.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 214: blk.23.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 215: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 216: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 217: blk.24.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 218: blk.24.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 219: blk.24.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 220: blk.24.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 221: blk.24.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 222: blk.24.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 223: blk.24.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 224: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 225: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 226: blk.25.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 227: blk.25.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 228: blk.25.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 229: blk.25.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 230: blk.25.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 231: blk.25.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 232: blk.25.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 233: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 234: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 235: blk.26.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 236: blk.26.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 237: blk.26.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 238: blk.26.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 239: blk.26.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 240: blk.26.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 241: blk.26.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 242: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 243: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 244: blk.27.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 245: blk.27.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 246: blk.27.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 247: blk.27.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 248: blk.27.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 249: blk.27.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 250: blk.27.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 251: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 252: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 253: blk.28.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 254: blk.28.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 255: blk.28.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 256: blk.28.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 257: blk.28.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 258: blk.28.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 259: blk.28.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 260: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 261: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 262: blk.29.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 263: blk.29.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 264: blk.29.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 265: blk.29.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 266: blk.29.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 267: blk.29.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 268: blk.29.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 269: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 270: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 271: blk.30.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 272: blk.30.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 273: blk.30.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 274: blk.30.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 275: blk.30.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 276: blk.30.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 277: blk.30.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 278: blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 279: blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 280: blk.31.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 281: blk.31.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 282: blk.31.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 283: blk.31.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 284: blk.31.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 285: blk.31.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 286: blk.31.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 287: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 288: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 289: output_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 290: output.weight f16 [ 4096, 46336, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: general.source.hugginface.repository str
llama_model_loader: - kv 3: llama.tensor_data_layout str
llama_model_loader: - kv 4: llama.context_length u32
llama_model_loader: - kv 5: llama.embedding_length u32
llama_model_loader: - kv 6: llama.block_count u32
llama_model_loader: - kv 7: llama.feed_forward_length u32
llama_model_loader: - kv 8: llama.rope.dimension_count u32
llama_model_loader: - kv 9: llama.attention.head_count u32
llama_model_loader: - kv 10: llama.attention.head_count_kv u32
llama_model_loader: - kv 11: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 12: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 13: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 14: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type f16: 226 tensors
error loading model: key not found in model: tokenizer.ggml.tokens
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/7B/ggml-model-f16.gguf'
main: error: unable to load model

@kurugai kurugai changed the title [User] Insert summary of your issue or enhancement.. convert-llama-hf-to-gguf.After py conversion, model loading is not possible with converted gguf file. Aug 29, 2023
@kurugai kurugai changed the title convert-llama-hf-to-gguf.After py conversion, model loading is not possible with converted gguf file. convert-llama-hf-to-gguf.The gguf file converted to py does not load. Aug 29, 2023
@KerfuffleV2
Copy link
Collaborator

I think tokenizer.model was missing from the directory you converted from. Right now, some of those scripts just skip including vocabulary if the file isn't there without informing the user.

llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from models/7B/ggml-model-f16.gguf (version GGUF V1L����.llama_model_loader: - tensor 0: token_embd.weight f16 [ 4096, 46336, 1, 1 ]

The special characters at the GGUF version also look kind of weird. I'm pretty sure your main issue in the tokenizer.model thing though.

@kurugai
Copy link
Author

kurugai commented Aug 29, 2023

dear KerfuffleV2

Thank you for your reply.
As you mentioned, there was no tokenizer.model file in the model I was trying to make a gguf.
But I checked that the tokenizer.json file is there.
I'm sorry to keep asking questions, but can I ask you how to make tokenizer.model?

MODEL URL : https://huggingface.co/kfkas/Llama-2-ko-7b-Chat/tree/main

FILE LIST
.gitattributes
LICENSE
README.md
config.json
generation_config.json
pytorch_model-00001-of-00002.bin
pytorch_model-00002-of-00002.bin
pytorch_model.bin.index.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json

@KerfuffleV2
Copy link
Collaborator

I'm sorry to keep asking questions, but can I ask you how to make tokenizer.model?

No need to apologize. I think because it's a Korean model that it uses a different tokenizer type than that script expects. From your link:

"Since Llama-2-Ko uses FastTokenizer provided by HF tokenizers NOT sentencepiece package, it is required to use use_fast=True option when initialize tokenizer."

I'm not an expert on this, but I think that may mean it uses a BPE tokenizer rather than SPM (which is typical for LLaMA models). I don't know if it will work, but you can try using the main convert.py script with --vocabtype bpe

It's possible this model uses a type of tokenizer or configuration that llama.cpp doesn't currently support.

@klosax
Copy link
Collaborator

klosax commented Aug 29, 2023

I'm not an expert on this, but I think that may mean it uses a BPE tokenizer rather than SPM

In tokenizer.json it looks like it uses the BPE tokenizer:

...
  "model": {
    "type": "BPE",
    "dropout": null,
...

@kurugai
Copy link
Author

kurugai commented Aug 29, 2023

I think I need the vocab.json file. However, there is an error because this file is not in this model folder.

E:\AI\llama.cpp>python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00001-of-00002.bin
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00001-of-00002.bin
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00002-of-00002.bin
params = Params(n_vocab=46336, n_embd=4096, n_mult=5504, n_layer=32, n_ctx=2048, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-05, f_rope_freq_base=None, f_rope_scale=None, ftype=None, path_model=WindowsPath('models/kfkas_Llama-2-ko-7b-Chat'))
Traceback (most recent call last):
File "E:\AI\llama.cpp\convert.py", line 1172, in
main()
File "E:\AI\llama.cpp\convert.py", line 1156, in main
vocab = load_vocab(vocab_dir, args.vocabtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\llama.cpp\convert.py", line 1064, in load_vocab
raise FileNotFoundError(
FileNotFoundError: Could not find vocab.json in models\kfkas_Llama-2-ko-7b-Chat or its parent; if it's in another directory, pass the directory as --vocab-dir

@klosax
Copy link
Collaborator

klosax commented Aug 29, 2023

I think I need the vocab.json file. However, there is an error because this file is not in this model folder.

No the conversion script does this wrong, it should use the tokenizer.json file if it exists.

@KerfuffleV2
Copy link
Collaborator

I think this little script will work for extracting the vocab:

import json, sys
tokenizer = json.load(sys.stdin)
json.dump(tokenizer['model']['vocab'], sys.stdout)

It reads from standard input and writes to standard output so you'll need to do something like:

python blah.py < tokenizer.json > vocab.json

@kurugai
Copy link
Author

kurugai commented Aug 29, 2023

I've made progress with your continuous guidance.
As you said, blah.I made a py script and ran it at the DOS prompt, so the vocab.json file was made well. Thank you.

And execute the command below to a.I checked that the file called gguf was also created well without errors!
python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat

--- output ----
[289/291] Writing tensor blk.31.ffn_norm.weight | size 4096 | type F32 | T+ 23
[290/291] Writing tensor output_norm.weight | size 4096 | type F32 | T+ 23
[291/291] Writing tensor output.weight | size 46336 x 4096 | type F16 | T+ 23
Wrote a.gguf

--- a.gguf's info ---
2023-08-29 PM 09:31 13,713,148,992 a.gguf

But When running 'E:\AI\llama.cpp>main -m a.gguf', there is a problem that LLM has to generate arbitrary strings, but it cannot. I think the gguf file is well made, but it's weird.

--- output ----
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q8_0: 226 tensors
llm_load_print_meta: format = GGUF V1 (support until nov 2023)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 46336
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly Q8_0
llm_load_print_meta: model size = 6.86 B
llm_load_print_meta: general.name = LLaMA
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.09 MB
llm_load_tensors: mem required = 6947.73 MB (+ 256.00 MB per state)
.................................................................................................
llama_new_context_with_model: kv self size = 256.00 MB
llama_new_context_with_model: compute buffer total size = 99.91 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

       <----- no generated string

@KerfuffleV2
Copy link
Collaborator

What if you specify a prompt like:

main -m a.gguf -p "Why is the sky blue?"

@kurugai
Copy link
Author

kurugai commented Aug 29, 2023

import json, sys tokenizer = json.load(sys.stdin) json.dump(tokenizer['model']['vocab'], sys.stdout, ensure_ascii=False)

I added 'ensure_ascii=False' to json dump due to Korean Unicode display problem.

@kurugai
Copy link
Author

kurugai commented Aug 29, 2023

run : main -m a.gguf -p "Why is the sky blue?"

output :
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

                        <----- I waited for about 5 minutes, but no strings were generated.

^C
E:\AI\llama.cpp>

@KerfuffleV2
Copy link
Collaborator

If there was going to be output, you'd see it pretty quickly. This is almost certainly an issue with the vocabulary but I'm not knowledgeable enough to really fix it.

Just in case it's something to do with the ensure_ascii thing or using redirection, you can try this alternative for converting the vocab:

import json
with open("tokenizer.json", "r", encoding="utf-8") as f:
  tokenizer = json.load(f)
with open("vocab.json", "w", encoding="utf-8") as f:
  json.dump(tokenizer['model']['vocab'], f)

I doubt it will make a difference though. If not, hopefully someone else will be able to help you.

@kurugai
Copy link
Author

kurugai commented Aug 29, 2023

dear KerfuffleV2

I created vocab.json with the modified code, but it is not generating the same string.
I've been searching online, but it's not an easy fight :-)
Thank you for helping me all day long.

@KerfuffleV2
Copy link
Collaborator

I created vocab.json with the modified code, but it is not generating the same string.

Do you just mean the result is the same: no output? If so, unfortunately that's pretty much what I expected because I didn't expect the second version of the conversion script to really make a difference.

I don't think you're doing anything wrong, it just doesn't seem like llama.cpp currently supports that particular model.

I'd suggest keeping this issue open but editing it a bit to be something more like "Converting kfkas Llama-2-ko-7b-Chat to GGUF fails" or possibly create a different issue like "Please add support for kfkas llama-2-ko-7b-chat" and link here for context.

@kurugai kurugai changed the title convert-llama-hf-to-gguf.The gguf file converted to py does not load. Converting kfkas Llama-2-ko-7b-Chat to GGUF fails Aug 29, 2023
@kurugai
Copy link
Author

kurugai commented Aug 29, 2023

Do you just mean the result is the same: no output?

yes. The same string was not generated.

As you said, I revised the title of this issue and registered a new issue. Thank you for your advice.^^

@akeyhero
Copy link

akeyhero commented Aug 29, 2023

I could reproduce this on the original Llama 2 with --vocabtype bpe.

Note that the tokenizer.json of the Llama 2 says type == BPE although they indeed have tokenizer.model and I confirmed Llama 2 gguf worked with tokenizer.model (namely without --vocabtype bpe):

(snip)
  "model": {
    "type": "BPE",
    "dropout": null,
    "unk_token": "<unk>",
    "continuing_subword_prefix": null,
    "end_of_word_suffix": null,
    "fuse_unk": true,
    "byte_fallback": true,
    "vocab": {
      "<unk>": 0,
(snip)

I downloaded Llama 2 files in models/Llama-2-7b-chat-hf and then

# create vocab.json
$ cat models/Llama-2-7b-chat-hf/tokenizer.json | jq --ascii-output '.model.vocab' > models/Llama-2-7b-chat-hf/vocab.json
$ python convert.py models/Llama-2-7b-chat-hf --vocabtype bpe
$ ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf--verbose-prompt -n 128 -p "$(echo "<s>[INST] How are you? [/INST]")"
(snip)
llm_load_print_meta: format         = GGUF V1 (support until nov 2023)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 4096
llm_load_print_meta: n_head         = 32
llm_load_print_meta: n_head_kv      = 32
llm_load_print_meta: n_layer        = 32
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: n_ff           = 11008
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 7B
llm_load_print_meta: model ftype    = mostly F16
llm_load_print_meta: model size     = 6.74 B
llm_load_print_meta: general.name   = models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.09 MB
llm_load_tensors: mem required  = 12853.10 MB (+  256.00 MB per state)
...................................................................................................
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size =   71.91 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |

main: prompt: '<s>[INST] How are you? [/INST]'
main: number of tokens in prompt = 18
     1 -> ''
   529 -> ''
 29879 -> ''
 24566 -> ''
 29902 -> ''
  3059 -> ''
 29911 -> ''
 29962 -> ''
  1128 -> ''
   526 -> ''
   366 -> ''
 29973 -> ''
   518 -> ''
 29914 -> ''
 29902 -> ''
  3059 -> ''
 29911 -> ''
 29962 -> ''

sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


 [end of text]

I wonder if vocab type = SPM is correct in this setting.

(Of course, I can do with tokenizer.model in the case of the original Llama 2, but the model I want to try does not have tokenizer.model)

@klosax
Copy link
Collaborator

klosax commented Aug 29, 2023

I wonder if vocab type = SPM is correct in this setting.

No the conversion script should set the tokenizer model kv properly to gpt2 when the source model uses BPE tokenizer.
@KerfuffleV2

@KerfuffleV2
Copy link
Collaborator

No the conversion script should set the tokenizer model kv properly to gpt2 when the source model uses BPE tokenizer.

Ahh, it seems like convert.py just always sets it to llama no matter what. I can fix it in #2842

@KerfuffleV2
Copy link
Collaborator

@kurugai If you want to try what klosax suggested, find the line

        self.gguf.add_tokenizer_model("llama")

In convert.py and change it to this:

        if isinstance(vocab, SentencePieceVocab):
            self.gguf.add_tokenizer_model("llama")
        elif isinstance(vocab, BpeVocab):
            self.gguf.add_tokenizer_model("gpt2")
        else:
            raise ValueError(f'Unknown vocab type: Not BpeVocab or SentencePieceVocab')

@klosax
Copy link
Collaborator

klosax commented Aug 29, 2023

This line:

self.gguf.add_tokenizer_model("llama")

@kurugai
Copy link
Author

kurugai commented Aug 29, 2023

@KerfuffleV2
I modified convert.py as follows.

        #self.gguf.add_tokenizer_model("llama")
        if isinstance(vocab, SentencePieceVocab):
            self.gguf.add_tokenizer_model("llama")
        elif isinstance(vocab, BpeVocab):
            self.gguf.add_tokenizer_model("gpt2")
        else:
            raise ValueError(f'Unknown vocab type: Not BpeVocab or SentencePieceVocab')

And I made 'a.gguf' using the command below.

python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat

However, when executing the main command, the following error message was displayed during the model loading process.

main -m a.gguf -p "Why is the sky blue?"
................. (omission)
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  226 tensors
error loading model: cannot find tokenizer merges in model file

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'a.gguf'
main: error: unable to load model

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Aug 29, 2023

Ahh, I forgot the version in master doesn't handle merges. If you're comfortable with testing a pull, you can try checking out #2842 and using that (you'll need to install the GGUF package from that pull as well).

Unless you're really impatient, your best bet is probably to just wait until that pull gets merged. That will hopefully fix this issue.

edit: Just want to add that I'd be really happy for people to test those changes. So if you do want to try it but need to ask some questions first, that's no problem. Don't be afraid of bothering me, it's up to whether you feel like going through the trouble or not.

@kurugai
Copy link
Author

kurugai commented Aug 30, 2023

@KerfuffleV2

Thank you for your feedback. First of all, I'm not used to pull tests. I'll wait until it merges. The day of the merger!! I'll check right away. Thank you for letting me know your sincerity.

@akeyhero
Copy link

I ran make clean and make after checked out to KerfuffleV2/feat-scripts-improvements.
I got:

$ python convert.py models/Llama-2-7b-chat-hf --vocabtype bpe
Traceback (most recent call last):
  File "/Users/xxxx/projects/ggerganov/llama.cpp/convert.py", line 808, in <module>
    class OutputFile:
  File "/Users/xxxx/projects/ggerganov/llama.cpp/convert.py", line 859, in OutputFile
    def add_meta_special_vocab(self, svocab: gguf.SpecialVocab) -> None:
                                             ^^^^^^^^^^^^^^^^^
AttributeError: module 'gguf' has no attribute 'SpecialVocab'

Do you have any idea to solve this?

@KerfuffleV2
Copy link
Collaborator

Do you have any idea to solve this?

You need to install the gguf Python package from that fork. Assuming you're already in a Python virtual environment you can do pip install --upgrade ./gguf-py

You might need to reactivate the environment also.

@akeyhero
Copy link

akeyhero commented Aug 30, 2023

Thank you. I've totally forgotten about pip stuff.

I ran this in addition to #2865 (comment) (although without merges.txt I got no error on convert.py.):

$ cat models/Llama-2-7b-chat-hf/tokenizer.json | jq -r --ascii-output '.model.merges[]' > models/Llama-2-7b-chat-hf/merges.txt

And then:

$ ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf --verbose-prompt -n 128 -p "$(echo "<s>[INST] How are you? [/INST]")"
(snip)
ERROR: byte not found in vocab: '
'
zsh: segmentation fault  ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf --verbose-prompt -n

Any idea? 😭

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Aug 30, 2023

Any idea?

You're just trying this with a normal LLaMA2 model not the one OP was testing, right? The only thing I can think of is it's because you're using a model that wasn't intended to use the BPE tokenizer mode. I'm not an expert on the tokenizer stuff so that idea might not be worth too much. I'm going to download OP's exact model and try it, if I get the same result as you then we'll know it's not because of what I mentioned.

edit: Your issue looks like #2889 so maybe it's just an issue with the BPE tokenizer and nothing you did. You could try loading the model you generated with #2842 using main compiled from #2889 and see if that fixes your issue.


edit: So, I got OP's Korean model converted (it did require generating vocab.json). This does need #2889 to avoid dying immediately. All the token contents still map to blank string strings because convert adds BPE vocab tokens as USER_DEFINED but there's no case to handle converting those to string (there's a partial workaround in the comments for that pull).

@akeyhero
Copy link

Thank you for your reply.
#2889 should be exactly my issue.

@KerfuffleV2
Copy link
Collaborator

Unfortunately, even with the change I suggested in the comments there it's still not really going to be correct. You'll see stuff like <0x20> instead of spaces.

@kurugai
Copy link
Author

kurugai commented Aug 30, 2023

@KerfuffleV2

Version info

  • gguf 0.3.0
  • llama.cpp 2023-08-30 ver

Comment

Hi. I think it's merged, so I installed a new package of llama.cpp and gguf and made 'a.gguf' in the same way as yesterday.
The following error was displayed when running main, and the inference string was not generated.

Is it correct that the merger has been completed?

ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '

Below is the full log of the main command.

E:\AI\llama.cpp>main -m a.gguf -p "Why is the sky blue?"
Log start
main: build = 1128 (b532a69)
main: seed  = 1693403947
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from a.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q8_0     [  4096, 46336,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    6:            blk.0.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    7:              blk.0.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    8:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    9:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   10:              blk.1.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   11:              blk.1.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   12:              blk.1.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   13:         blk.1.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   14:            blk.1.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   15:            blk.1.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   16:              blk.1.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   17:           blk.1.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   18:            blk.1.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   19:              blk.2.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   20:              blk.2.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   21:              blk.2.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   22:         blk.2.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   23:            blk.2.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   24:            blk.2.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   25:              blk.2.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   26:           blk.2.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   27:            blk.2.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   28:              blk.3.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   29:              blk.3.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   30:              blk.3.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   31:         blk.3.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   32:            blk.3.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   33:            blk.3.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   34:              blk.3.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   35:           blk.3.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   36:            blk.3.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   37:              blk.4.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   38:              blk.4.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   39:              blk.4.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   40:         blk.4.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   41:            blk.4.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   42:            blk.4.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   43:              blk.4.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   44:           blk.4.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   45:            blk.4.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   46:              blk.5.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   47:              blk.5.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   48:              blk.5.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   49:         blk.5.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   50:            blk.5.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   51:            blk.5.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   52:              blk.5.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   53:           blk.5.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   54:            blk.5.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   55:              blk.6.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   56:              blk.6.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   57:              blk.6.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   58:         blk.6.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   59:            blk.6.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   60:            blk.6.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   61:              blk.6.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   62:           blk.6.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   63:            blk.6.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   64:              blk.7.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   65:              blk.7.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   66:              blk.7.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   67:         blk.7.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   68:            blk.7.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   69:            blk.7.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   70:              blk.7.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   71:           blk.7.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   72:            blk.7.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   73:              blk.8.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   74:              blk.8.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   75:              blk.8.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   76:         blk.8.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   77:            blk.8.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   78:            blk.8.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   79:              blk.8.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   80:           blk.8.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   81:            blk.8.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   82:              blk.9.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   83:              blk.9.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   84:              blk.9.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   85:         blk.9.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   86:            blk.9.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   87:            blk.9.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   88:              blk.9.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   89:           blk.9.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   90:            blk.9.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   91:             blk.10.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   92:             blk.10.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   93:             blk.10.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   94:        blk.10.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   95:           blk.10.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   96:           blk.10.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   97:             blk.10.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   98:          blk.10.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   99:           blk.10.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  100:             blk.11.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  101:             blk.11.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  102:             blk.11.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  103:        blk.11.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  104:           blk.11.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  105:           blk.11.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  106:             blk.11.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  107:          blk.11.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  108:           blk.11.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  109:             blk.12.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  110:             blk.12.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  111:             blk.12.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  112:        blk.12.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  113:           blk.12.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  114:           blk.12.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  115:             blk.12.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  116:          blk.12.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  117:           blk.12.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  118:             blk.13.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  119:             blk.13.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  120:             blk.13.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  121:        blk.13.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  122:           blk.13.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  123:           blk.13.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  124:             blk.13.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  125:          blk.13.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  126:           blk.13.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  127:             blk.14.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  128:             blk.14.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  129:             blk.14.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  130:        blk.14.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  131:           blk.14.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  132:           blk.14.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  133:             blk.14.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  134:          blk.14.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  135:           blk.14.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  136:             blk.15.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  137:             blk.15.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  138:             blk.15.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  139:        blk.15.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  140:           blk.15.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  141:           blk.15.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  142:             blk.15.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  143:          blk.15.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  144:           blk.15.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  145:             blk.16.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  146:             blk.16.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  147:             blk.16.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  148:        blk.16.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  149:           blk.16.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  150:           blk.16.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  151:             blk.16.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  152:          blk.16.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  153:           blk.16.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  154:             blk.17.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  155:             blk.17.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  156:             blk.17.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  157:        blk.17.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  158:           blk.17.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  159:           blk.17.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  160:             blk.17.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  161:          blk.17.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  162:           blk.17.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  163:             blk.18.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  164:             blk.18.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  165:             blk.18.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  166:        blk.18.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  167:           blk.18.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  168:           blk.18.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  169:             blk.18.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  170:          blk.18.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  171:           blk.18.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  172:             blk.19.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  173:             blk.19.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  174:             blk.19.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  175:        blk.19.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  176:           blk.19.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  177:           blk.19.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  178:             blk.19.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  179:          blk.19.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  180:           blk.19.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  181:             blk.20.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  182:             blk.20.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  183:             blk.20.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  184:        blk.20.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  185:           blk.20.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  186:           blk.20.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  187:             blk.20.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  188:          blk.20.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  189:           blk.20.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  190:             blk.21.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  191:             blk.21.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  192:             blk.21.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  193:        blk.21.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  194:           blk.21.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  195:           blk.21.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  196:             blk.21.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  197:          blk.21.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  198:           blk.21.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  199:             blk.22.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  200:             blk.22.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  201:             blk.22.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  202:        blk.22.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  203:           blk.22.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  204:           blk.22.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  205:             blk.22.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  206:          blk.22.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  207:           blk.22.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  208:             blk.23.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  209:             blk.23.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  210:             blk.23.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  211:        blk.23.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  212:           blk.23.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  213:           blk.23.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  214:             blk.23.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  215:          blk.23.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  216:           blk.23.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  217:             blk.24.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  218:             blk.24.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  219:             blk.24.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  220:        blk.24.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  221:           blk.24.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  222:           blk.24.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  223:             blk.24.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  224:          blk.24.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  225:           blk.24.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  226:             blk.25.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  227:             blk.25.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  228:             blk.25.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  229:        blk.25.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  230:           blk.25.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  231:           blk.25.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  232:             blk.25.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  233:          blk.25.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  234:           blk.25.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  235:             blk.26.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  236:             blk.26.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  237:             blk.26.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  238:        blk.26.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  239:           blk.26.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  240:           blk.26.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  241:             blk.26.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  242:          blk.26.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  243:           blk.26.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  244:             blk.27.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  245:             blk.27.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  246:             blk.27.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  247:        blk.27.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  248:           blk.27.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  249:           blk.27.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  250:             blk.27.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  251:          blk.27.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  252:           blk.27.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  253:             blk.28.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  254:             blk.28.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  255:             blk.28.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  256:        blk.28.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  257:           blk.28.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  258:           blk.28.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  259:             blk.28.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  260:          blk.28.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  261:           blk.28.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  262:             blk.29.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  263:             blk.29.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  264:             blk.29.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  265:        blk.29.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  266:           blk.29.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  267:           blk.29.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  268:             blk.29.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  269:          blk.29.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  270:           blk.29.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  271:             blk.30.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  272:             blk.30.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  273:             blk.30.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  274:        blk.30.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  275:           blk.30.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  276:           blk.30.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  277:             blk.30.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  278:          blk.30.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  279:           blk.30.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  280:             blk.31.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  281:             blk.31.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  282:             blk.31.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  283:        blk.31.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  284:           blk.31.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  285:           blk.31.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  286:             blk.31.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  287:          blk.31.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  288:           blk.31.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  289:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  290:                    output.weight q8_0     [  4096, 46336,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str
llama_model_loader: - kv   1:                               general.name str
llama_model_loader: - kv   2:                       llama.context_length u32
llama_model_loader: - kv   3:                     llama.embedding_length u32
llama_model_loader: - kv   4:                          llama.block_count u32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32
llama_model_loader: - kv   7:                 llama.attention.head_count u32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv  10:                          general.file_type u32
llama_model_loader: - kv  11:                       tokenizer.ggml.model str
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q8_0:  226 tensors
ERROR: byte not found in vocab: '
'
llm_load_print_meta: format         = GGUF V2 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = BPE
llm_load_print_meta: n_vocab        = 46336
llm_load_print_meta: n_merges       = 77738
llm_load_print_meta: n_ctx_train    = 2048
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 4096
llm_load_print_meta: n_head         = 32
llm_load_print_meta: n_head_kv      = 32
llm_load_print_meta: n_layer        = 32
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 11008
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 7B
llm_load_print_meta: model ftype    = mostly Q8_0
llm_load_print_meta: model size     = 6.86 B
llm_load_print_meta: general.name   = models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 0 '<unk>'
llm_load_tensors: ggml ctx size =    0.09 MB
llm_load_tensors: mem required  = 6947.73 MB (+  256.00 MB per state)
.................................................................................................
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size =   99.97 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

@KerfuffleV2
Copy link
Collaborator

Is it correct that the merger has been completed?

Yes, it got merged today. Unfortunately, that wasn't enough to fix models using BPE (like this one). Look a little bit higher in the thread, I linked to a pull with a fix for the "byte not found thing". However, even with that change the content of all the tokens is still blank. There's a partial fix in the comments, but there are still problems.

The good news it seems like people are aware of at least some of the problems and they're being looked at/worked on.

@kurugai
Copy link
Author

kurugai commented Aug 30, 2023

@KerfuffleV2

The good news it seems like people are aware of at least some of the problems and they're being looked at/worked on.

Good news! I will try whenever there is a related source modification in the future. :)

@akeyhero
Copy link

akeyhero commented Sep 8, 2023

@kurugai The byte not found in vocab errors might have been solved as #2889 had merged.
(I failed with this error though #2965)

@kurugai
Copy link
Author

kurugai commented Nov 28, 2023

@KerfuffleV2
Successfully converted to convert.py at the site below. Thank you for your help in the meantime. I'll close this one.
https://github.com/strutive07/llama.cpp/tree/convert_hf_vocab

@kurugai kurugai closed this as completed Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants