Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems encountered during speculative decoding execution #888

Open
PoHaoYen opened this issue Jul 31, 2024 · 1 comment
Open

Problems encountered during speculative decoding execution #888

PoHaoYen opened this issue Jul 31, 2024 · 1 comment

Comments

@PoHaoYen
Copy link

Hi, I attempted to use speculative decoding but encountered some errors. May I ask for your assistance?

I used the parameters from the first example.

python ./examples/speculative_inference.py \
--model gpt2-xl
--draft_model gpt2
--temperature 0.3
--gamma 5
--max_new_tokens 512
--gpu 0

An error occurred during the first execution:
RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu

Then I modified HFDecoderModel in hf_decoder_model.py to use cuda, and the following error occurred:
NotImplementedError: device "cuda" is not supported

On the third attempt, I changed it to use cpu and got the error:
ValueError: The following model_kwargs are not used by the model: ['use_accelerator']"

Is there any configuration or environment setting error on my part?

@huangzl19
Copy link

I encountered the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants