[Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks #162

llCurious · 2024-03-11T06:13:12Z

Prerequisites

Before submitting your question, please ensure the following:

I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
I have carefully read and followed the instructions in the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Question Details

I use the sparsed LLaMA model from SparseLLM (Huggingface), named as ReluLLaMA-7B. I calculate the PPL with max_seq_length of 512 for wikitext2 dataset (from Huggingface). However, the PPL reaches 16003, while the original dense LLaMA2, named as Llama-2-7b-hf has a PPL of 54.

There seems to be a huge PPL loss due to the relu activation. Do you have any ideas on this phenomenon?

Additional Context

All the packages use the latest version. All the models and datasets are from Huggingface.

llCurious · 2024-03-11T08:18:45Z

hi @hodlen , do you have any ideas?

hodlen · 2024-04-06T14:16:51Z

Sorry for the late reply. That was a bit unexpected since we have tested its perplexity under both transformers/torch and PowerInfer. Can you provide minimal reproducible code so we can further help to investigate?

llCurious added the question Further information is requested label Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks #162

[Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks #162

llCurious commented Mar 11, 2024 •

edited

Loading

llCurious commented Mar 11, 2024

hodlen commented Apr 6, 2024

[Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks #162

[Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks #162

Comments

llCurious commented Mar 11, 2024 • edited Loading

Prerequisites

Question Details

Additional Context

llCurious commented Mar 11, 2024

hodlen commented Apr 6, 2024

llCurious commented Mar 11, 2024 •

edited

Loading