convert-hf-to-gguf.py Qwen-72B-Chat model get Killed result #5156

dfengpo · 2024-01-27T03:02:05Z

I use python convert-hf-to-gguf.py /Qwen-72B-Chat.
And I am getting the same error:
blk.33.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.34.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.35.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.35.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 Killed

What does this mean “Killed”?
@ggerganov @slaren @prusnak

The text was updated successfully, but these errors were encountered:

prusnak · 2024-01-27T03:50:16Z

I assume you have run out of memory. How much RAM do you have?

dfengpo · 2024-01-27T04:44:51Z

I assume you have run out of memory. How much RAM do you have?

My RAM is 64Gb，cpu32

ngxson · 2024-01-27T09:46:36Z

The process is likely to be killed because of low memory: https://stackoverflow.com/questions/726690/what-killed-my-process-and-why

Galunid · 2024-01-27T11:33:33Z

It seems like a bug, and the offending line is

llama.cpp/convert-hf-to-gguf.py

Line 999 in a1d6df1

model_kv = dict(self.get_tensors())

Which makes the script load the whole (80GB+) model into memory, instead of using mmap from torch

@lmxin123 Could you try the script with the changes below?

diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 7a0a8c3d..8cef8429 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -996,9 +996,8 @@ class QwenModel(Model):
 
     def write_tensors(self):
         block_count = self.hparams["num_hidden_layers"]
-        model_kv = dict(self.get_tensors())
         tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)
-        for name, data_torch in model_kv.items():
+        for name, data_torch in self.get_tensors():
             # we don't need these
             if name.endswith(".rotary_emb.inv_freq"):
                 continue

dfengpo · 2024-01-28T03:26:46Z

It seems like a bug, and the offending line is

llama.cpp/convert-hf-to-gguf.py

Line 999 in a1d6df1

model_kv = dict(self.get_tensors())

Which makes the script load the whole (80GB+) model into memory, instead of using mmap from torch

@lmxin123 Could you try the script with the changes below?

diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 7a0a8c3d..8cef8429 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -996,9 +996,8 @@ class QwenModel(Model):
 
     def write_tensors(self):
         block_count = self.hparams["num_hidden_layers"]
-        model_kv = dict(self.get_tensors())
         tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)
-        for name, data_torch in model_kv.items():
+        for name, data_torch in self.get_tensors():
             # we don't need these
             if name.endswith(".rotary_emb.inv_freq"):
                 continue

Thank you for your response, but unfortunately, I don't know Python, and I'm unable to test your modifications. I look forward to an official update of the version.

timopb · 2024-01-28T11:07:36Z

It seems like a bug, and the offending line is

llama.cpp/convert-hf-to-gguf.py

Line 999 in a1d6df1

model_kv = dict(self.get_tensors())

Which makes the script load the whole (80GB+) model into memory, instead of using mmap from torch

@lmxin123 Could you try the script with the changes below?

diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 7a0a8c3d..8cef8429 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -996,9 +996,8 @@ class QwenModel(Model):
 
     def write_tensors(self):
         block_count = self.hparams["num_hidden_layers"]
-        model_kv = dict(self.get_tensors())
         tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)
-        for name, data_torch in model_kv.items():
+        for name, data_torch in self.get_tensors():
             # we don't need these
             if name.endswith(".rotary_emb.inv_freq"):
                 continue

Having the same issue with converting falcon-40b on a machine with 24GB RAM. Process gets killed most likely due to lack of memory. I applied the patch from your response but it didn't help unfortunately.

Galunid · 2024-01-28T17:35:10Z

@lmxin123 Could you try with this script? https://gist.github.com/Galunid/c169dd4078c9cb11e8d8a4a8888eab2b
Just copy the contents into convert-hf-to-gguf.py and run it like you normally would.

Galunid · 2024-01-28T20:25:18Z

@timopb Falcon is a separate issue and the above is not applicable.

arch-btw · 2024-02-04T07:44:44Z

I'm having the same problem even with the new script by @Galunid

It's not just loading the original model into ram, it's also writing the new model to ram first, instead of disk.

github-actions · 2024-03-18T01:32:56Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2024-04-02T01:08:09Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

okwinds · 2024-04-06T17:11:32Z

This issue is likely caused by an Out of Memory (OOM) error. You can prevent OOM by utilizing virtual memory and creating a swap file to allocate additional resources. Here's how you can establish a swap file and apply it to your virtual memory:

First, create a swap file with a size that corresponds to the dimensions of your model. For instance, if you require a substantial amount of memory, you can allocate 20 gigabytes (20G) for the swap file using the following command:

sudo fallocate -l 20G /swapfile
Next, change the permissions of the swap file to ensure it is only accessible by the root user:

sudo chmod 600 /swapfile
Now, turn the file into a swap area that the system can use:

sudo mkswap /swapfile
Activate the swap file so that it is ready for use:

sudo swapon /swapfile

With the swap file in place, you should be able to transform your model without encountering an OOM error.

After you have successfully transformed your model, you can disable the swap file from the virtual memory and delete it to free up space. Here's how:

Turn off the swap file:

sudo swapoff /swapfile
Remove the swap file:

sudo rm /swapfile

By following these steps, you can effectively manage your system's memory resources and prevent OOM errors during model transformations.

btw,
You can use the free -h command to check the status of your virtual memory and determine if it has been set up successfully.

Good luck~

I use python convert-hf-to-gguf.py /Qwen-72B-Chat. And I am getting the same error: blk.33.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.34.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.35.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.35.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 Killed

What does this mean “Killed”? @ggerganov @slaren @prusnak

dfengpo added the bug-unconfirmed label Jan 27, 2024

Galunid added bug Something isn't working and removed bug-unconfirmed labels Jan 27, 2024

github-actions bot added the stale label Mar 18, 2024

github-actions bot closed this as completed Apr 2, 2024

dave-fl mentioned this issue Apr 7, 2024

model: support arch DbrxForCausalLM #6515

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert-hf-to-gguf.py Qwen-72B-Chat model get Killed result #5156

convert-hf-to-gguf.py Qwen-72B-Chat model get Killed result #5156

dfengpo commented Jan 27, 2024 •

edited

Loading

prusnak commented Jan 27, 2024

dfengpo commented Jan 27, 2024

ngxson commented Jan 27, 2024

Galunid commented Jan 27, 2024

dfengpo commented Jan 28, 2024

timopb commented Jan 28, 2024

Galunid commented Jan 28, 2024

Galunid commented Jan 28, 2024

arch-btw commented Feb 4, 2024 •

edited

Loading

github-actions bot commented Mar 18, 2024

github-actions bot commented Apr 2, 2024

okwinds commented Apr 6, 2024 •

edited

Loading

convert-hf-to-gguf.py Qwen-72B-Chat model get Killed result #5156

convert-hf-to-gguf.py Qwen-72B-Chat model get Killed result #5156

Comments

dfengpo commented Jan 27, 2024 • edited Loading

prusnak commented Jan 27, 2024

dfengpo commented Jan 27, 2024

ngxson commented Jan 27, 2024

Galunid commented Jan 27, 2024

dfengpo commented Jan 28, 2024

timopb commented Jan 28, 2024

Galunid commented Jan 28, 2024

Galunid commented Jan 28, 2024

arch-btw commented Feb 4, 2024 • edited Loading

github-actions bot commented Mar 18, 2024

github-actions bot commented Apr 2, 2024

okwinds commented Apr 6, 2024 • edited Loading

dfengpo commented Jan 27, 2024 •

edited

Loading

arch-btw commented Feb 4, 2024 •

edited

Loading

okwinds commented Apr 6, 2024 •

edited

Loading