[Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks #162
Open
3 tasks done
Labels
question
Further information is requested
Prerequisites
Before submitting your question, please ensure the following:
Question Details
I use the sparsed LLaMA model from SparseLLM (Huggingface), named as ReluLLaMA-7B. I calculate the PPL with max_seq_length of 512 for wikitext2 dataset (from Huggingface). However, the PPL reaches 16003, while the original dense LLaMA2, named as Llama-2-7b-hf has a PPL of 54.
There seems to be a huge PPL loss due to the relu activation. Do you have any ideas on this phenomenon?
Additional Context
All the packages use the latest version. All the models and datasets are from Huggingface.
The text was updated successfully, but these errors were encountered: