Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Deepseek MoE #2429

Closed
wants to merge 2 commits into from
Closed

Support Deepseek MoE #2429

wants to merge 2 commits into from

Conversation

esmeetu
Copy link
Collaborator

@esmeetu esmeetu commented Jan 12, 2024

Model info:

https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat

Current implementation will generate garbled text, and i need some help.

Test code:

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "def greet"
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, max_tokens=128)

# Create an LLM.
llm = LLM(model="deepseek-ai/deepseek-moe-16b-chat", dtype="half", enforce_eager=True, tensor_parallel_size=4, gpu_memory_utilization=0.95, trust_remote_code=True)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.

outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Old Ouput:

Prompt: 'def greet', Generated text: " 家4《0“d人 同一-开li...网是 下 ( 不是\n同:手望=你在同是是哪个字�d/ 3,1\n02的宇\n市、Google 3一时中国下在那里\n Me、、 -分 f\n\n小的高维AF'A 的 是(入\n 是很 名\n人 >>折料\n2的留 当\n2下\nof再\n的 狗了顾团队\n1手、重 了\n一些有\n一个在\n12"�

------- Update Ouput

Prompt: 'def greet', Generated text: '(name):\n    print("Hello, " + name + "!")\n\ngreet("Alice")'

Additional Model Chat template:

{% for message in messages %}
{% if message['role'] == 'user' %}
User: {{ message['content']|trim -}}
{% if not loop.last %}

{% endif %}
{% elif message['role'] == 'assistant' %}
Assistant: {{ message['content']|trim -}}{{ eos_token }}
{% if not loop.last %}

{% endif %}
{% endif %}
{% endfor %}
{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}

Assistant: {% endif %}

@zhuohan123
Copy link
Member

Can you compare with HF implementation by printing the tensors layer by layer to see where the results become off? This is typically how we debug this kind of issue.

@esmeetu
Copy link
Collaborator Author

esmeetu commented Jan 12, 2024

Can you compare with HF implementation by printing the tensors layer by layer to see where the results become off? This is typically how we debug this kind of issue.

Ok, i will try this.

@esmeetu esmeetu marked this pull request as draft January 13, 2024 01:14
@esmeetu esmeetu changed the title [WIP] Support Deepseek MoE (Need Help) Support Deepseek MoE Jan 13, 2024
@esmeetu esmeetu marked this pull request as ready for review January 13, 2024 15:16
@esmeetu
Copy link
Collaborator Author

esmeetu commented Jan 13, 2024

Hi, @zhuohan123. I returned to the official implementation of MoE. There should be some improvement space by adapting expert parallelism. And this PR is ready for review and will exploring optimization in the future PR. cc @WoosukKwon

@zwd003
Copy link
Contributor

zwd003 commented Jan 16, 2024

Thank you for your support of deepseek moe,we will subsequently launch an optimized inference version.

@esmeetu
Copy link
Collaborator Author

esmeetu commented Jan 16, 2024

Thank you for your support of deepseek moe,we will subsequently launch an optimized inference version.

Hi @zwd003, I am happy to hear Deepseek team voice about this. Glad to see your commit as soon as possible! And hope it will reach a great performance boost. 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants