Manual model warmup to resolve AOT model warmup performance degradation #126

vivianrwu · 2024-08-07T04:17:06Z

Use manual model warmup instead of AOT implemented model warmup, since with AOT, we observe performance degradation at higher batch size of maxtext configuration, mentioned in #92:

OOM at higher batch size (after model warmup, during an active request)
Slower detokenizing generate step time exponentially at higher batch sizes

This has been verified that the detokenizing generate step time remains same as JetStream optimal behavior for all batch sizes.

curl --request POST --header "Content-type: application/json"
 -s localhost:8000/generate --data '{
    "prompt": "What are the top 5 programming languages",
    "max_tokens": 200
}'
{
    "response": " for data science in 2023?\n\n1. Python\n2. R\n3. SQL\n4. Java\n5. Scala\n\n**Note:** The order is based on popularity and demand in the data science industry in 2023."
}

JoeZijunZhou

Do we need to update unit tests?

QQ on the description,

we set the max pdbs when we start the server, this value should be within memory cap (based on calculation w the devices used), then it would not OOM right?
Why higher actual batch size would have very slow detokenization? Could you share some investigation or profiles?

jetstream/core/orchestrator.py

vivianrwu · 2024-08-07T20:13:41Z

Do we need to update unit tests?

Unit tests do not need to be updated because it is on the condition of engine.warm

QQ on the description,

we set the max pdbs when we start the server, this value should be within memory cap (based on calculation w the devices used), then it would not OOM right?

Yes, I think the storage of the compiled graphs from AOT and executing it from AOT is what takes up the memory. We observe the OOM at generate request.

Why higher actual batch size would have very slow detokenization? Could you share some investigation or profiles?

Yes, you can reference #92 for some investigations. Also shared the doc internally.

jetstream/engine/warmup_utils.py

FanhaiLu1 · 2024-08-14T17:21:39Z

ified that the detokenizing generate step time remains same as JetStream optimal behavior for all batch sizes.

Did you figure out what is the root cause of performance issue and OOM for AOT?

jetstream/core/orchestrator.py

vivianrwu · 2024-08-14T18:15:25Z

ified that the detokenizing generate step time remains same as JetStream optimal behavior for all batch sizes.

Did you figure out what is the root cause of performance issue and OOM for AOT?

RCA has been attempted and the root cause of OOM can potentially be the added space to save the compiled graphs in executables alongside saving the cache in the compilation cache directory. The performance issue, has not been concluded. Could be unoptimal AOT executables. I can share the investigation offline

jetstream/engine/warmup_utils.py

vivianrwu added 4 commits July 29, 2024 18:57

Implement manual model warmup to resolve performance degradation

67b1aeb

fix insert generate compiled

bf9612e

remove check for JetStreamEngine in orchestrator

fa2ecaf

pyink pylint fixes

3552808

vivianrwu requested a review from vipannalla as a code owner August 7, 2024 04:17

JoeZijunZhou reviewed Aug 7, 2024

View reviewed changes

jetstream/core/orchestrator.py Show resolved Hide resolved

change references from aot to warmup

67cbeed

JoeZijunZhou reviewed Aug 13, 2024

View reviewed changes

jetstream/engine/warmup_utils.py Outdated Show resolved Hide resolved

fix non-empty comparison

0e34c69

FanhaiLu1 reviewed Aug 14, 2024

View reviewed changes

jetstream/core/orchestrator.py Show resolved Hide resolved

JoeZijunZhou reviewed Aug 14, 2024

View reviewed changes

jetstream/engine/warmup_utils.py Outdated Show resolved Hide resolved

use all() to check True in entire lists

348e6b7

JoeZijunZhou approved these changes Aug 14, 2024

View reviewed changes

JoeZijunZhou merged commit 59538fc into AI-Hypercomputer:main Aug 14, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manual model warmup to resolve AOT model warmup performance degradation #126

Manual model warmup to resolve AOT model warmup performance degradation #126

vivianrwu commented Aug 7, 2024

JoeZijunZhou left a comment

vivianrwu commented Aug 7, 2024 •

edited

Loading

FanhaiLu1 commented Aug 14, 2024

vivianrwu commented Aug 14, 2024

Manual model warmup to resolve AOT model warmup performance degradation #126

Manual model warmup to resolve AOT model warmup performance degradation #126

Conversation

vivianrwu commented Aug 7, 2024

JoeZijunZhou left a comment

Choose a reason for hiding this comment

vivianrwu commented Aug 7, 2024 • edited Loading

FanhaiLu1 commented Aug 14, 2024

vivianrwu commented Aug 14, 2024

vivianrwu commented Aug 7, 2024 •

edited

Loading