Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding GPT-NeoX #164

Closed
wants to merge 2 commits into from
Closed

Adding GPT-NeoX #164

wants to merge 2 commits into from

Conversation

aflah02
Copy link

@aflah02 aflah02 commented Feb 7, 2024

I followed along the instructions here to add GPT-NeoX support which would bring support for the Pythia model family and other similar architecture models.

Reference: #157 (comment)

FIXED (Keeping Logs for Future Reference):
I was able to debug most errors but I'm stuck on this particular error which happens once I start requesting on the endpoint (i.e. it loads correctly I assume) -

INFO:     Started server process [1344199]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:30013 (Press CTRL+C to quit)
INFO:     127.0.0.1:37998 - "GET /get_model_info HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 0. #new_token: 17. #remaining_req: 0. #running_req: 0
Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 165, in exposed_step
    self.forward_step()
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 180, in forward_step
    self.forward_fill_batch(new_batch)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 369, in forward_fill_batch
    logits, (logprobs, normalized_logprobs) = self.model_runner.forward(
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_runner.py", line 486, in forward
    return self.forward_extend(**kwargs)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_runner.py", line 391, in forward_extend
    return self.model.forward(input_ids, input_metadata.positions, input_metadata)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/models/gpt_neox.py", line 236, in forward
    return self.logits_processor(
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/layers/logits_processor.py", line 32, in forward
    last_logits = torch.matmul(last_hidden, weight.T)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'ParallelLMHead' object has no attribute 'T'

Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 165, in exposed_step
    self.forward_step()
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 180, in forward_step
    self.forward_fill_batch(new_batch)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 369, in forward_fill_batch
    logits, (logprobs, normalized_logprobs) = self.model_runner.forward(
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_runner.py", line 486, in forward
    return self.forward_extend(**kwargs)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_runner.py", line 391, in forward_extend
    return self.model.forward(input_ids, input_metadata.positions, input_metadata)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/models/gpt_neox.py", line 236, in forward
    return self.logits_processor(
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/layers/logits_processor.py", line 32, in forward
    last_logits = torch.matmul(last_hidden, weight.T)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'ParallelLMHead' object has no attribute 'T'

/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py:204: UserWarning: Warning: available_size=714944, max_total_num_token=714961
KV cache pool leak detected!
  warnings.warn(
/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py:204: UserWarning: Warning: available_size=714944, max_total_num_token=714961
KV cache pool leak detected!
  warnings.warn(

Any idea what might be going wrong? It seems that the error is related to the LogitProcessor which I'm not very familiar with. I've tried to copy the logic from the llama implementation for the same

@aflah02 aflah02 marked this pull request as draft February 7, 2024 21:08
@aflah02
Copy link
Author

aflah02 commented Feb 7, 2024

Update: I just noticed the missing part and changed that which fixes the old issue but now I get a new error -

Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 165, in exposed_step
    self.forward_step()
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 192, in forward_step
    self.forward_decode_batch(self.running_batch)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 429, in forward_decode_batch
    next_token_ids, next_token_probs = batch.sample(logits)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/infer_batch.py", line 452, in sample
    sampled_index = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

@aflah02
Copy link
Author

aflah02 commented Feb 7, 2024

I did some further testing. It runs perfectly for - https://github.com/aflah02/sglang/blob/main/examples/usage/choices_logprob.py
But fails for https://github.com/aflah02/sglang/blob/main/examples/quick_start/srt_example_chat.py with the error above
Seems like the issue might be elsewhere

@aflah02 aflah02 marked this pull request as ready for review February 7, 2024 21:26
@aflah02 aflah02 changed the title Adding GPT-NeoX [WIP] Adding GPT-NeoX Feb 7, 2024
@aflah02
Copy link
Author

aflah02 commented Feb 9, 2024

@merrymercy Any thoughts? Not sure why one tutorial works while the other doesn't

@merrymercy
Copy link
Contributor

merrymercy commented Feb 11, 2024

@aflah02

  1. Can you try this tutorial? https://github.com/sgl-project/sglang/blob/main/examples/quick_start/srt_example_complete.py
    The chat example does not work properly, possibly due to the vicuna chat template. The default chat template is vicuna, but GPT-NeoX has not been tuned on that template.
  2. Can you add more print statements to see where the nan comes from? Does it occur in early transformers layers? Does it only occur in the last layer?

@aflah02
Copy link
Author

aflah02 commented Feb 20, 2024

@merrymercy
For Part 2 It seems that the error mainly occurs in the last few layers/last layer. Some of the logs are here for the chat example - logs.txt
The first tutorial also gives a similar error

@aflah02
Copy link
Author

aflah02 commented Mar 1, 2024

@merrymercy Any thoughts on what might be going wrong here? I don't know whether a template can make such breaking issues

@merrymercy
Copy link
Contributor

@aflah02 I have no idea. I typically debug these kinds of wired bugs by comparing intermediate tensors layer by layer between sglang and huggingface/vllm implementations, similar to your print statements.

@aflah02
Copy link
Author

aflah02 commented Jun 12, 2024

@merrymercy Sorry for being inactive, life got really busy the past few months. I don't have the bandwidth nowadays to take this on and if you want to then feel free to work on this

@merrymercy
Copy link
Contributor

I will close this for now

@merrymercy merrymercy closed this Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants