Support SDXL and its distributed inference #1514

Zars19 · 2024-04-28T14:50:02Z

The idea of patch parallelism comes from the CVPR 2024 paper Distrifusion. In order to reduce the difficulty of implementation, all communications in the example are synchronous.

This can help SDXL achieve better performance, especially when the resolution is very high

A100, 50 steps, 2048x2048, SDXL

Framework	sync_mode	n_gpu	latency(s)	speed_up	memory(MiB)
Torch	-	1	25.25	1x	42147
TRT	-	1	21.98	1.15x	42895
DistrFusion(Torch)	split_batch	2	13.33	1.89x	40173
Ours	split_batch	2	11.69	2.16x	42675
DistrFusion(Torch)	corrected_async_gn	4	8.27	3.05x	49087
DistrFusion(Torch)	full_sync	4	8.64	2.92x	51943
Ours	full_sync	4	7.73	3.27x	43073

juney-nvidia · 2024-06-03T04:22:07Z

@Zars19 thanks for the contribution to TensorRT-LLM!

@nv-guomingz can you help take care of this? :)

Thanks
June

nv-guomingz · 2024-06-03T06:10:13Z

@Zars19 thanks for the contribution to TensorRT-LLM!

@nv-guomingz can you help take care of this? :)

Thanks June

Sure, I'll collobrate with @Zars19 for enabling SDXL with TRT-LLM.

nv-guomingz · 2024-06-03T06:10:45Z

Hi @Zars19 , could u please resolve the code conflicts firstly?

Zars19 · 2024-06-12T02:39:14Z

Hi @Zars19 , could u please resolve the code conflicts firstly?

I have resolved the conflict :) @nv-guomingz

nv-guomingz · 2024-07-10T09:10:09Z

Hi @Zars19 thanks for your patience.
Could u please update this MR by updating/rebasing those two commit(including one merge commit) into one commit which make us easy to integrate and testing?

Zars19 · 2024-07-12T09:16:54Z

@nv-guomingz I completed the git rebase

tensorrt_llm/builder.py

lmxyy · 2024-08-22T17:35:10Z

Any updates on the code review?

Zars19 · 2024-08-23T07:05:47Z

Any updates on the code review?

After rebasing the code, I haven't received feedback for a while now

Zars19 force-pushed the main branch from e95c53c to 8d3d8a1 Compare April 28, 2024 14:59

Zars19 changed the title ~~Add distributed inference for UNet models and SDXL examples~~ Support SDXL and its distributed inference Apr 30, 2024

juney-nvidia requested a review from nv-guomingz June 3, 2024 04:21

nv-guomingz added the waiting for feedback label Jun 5, 2024

Zars19 closed this Jul 12, 2024

Zars19 force-pushed the main branch from a062df0 to 66ef1df Compare July 12, 2024 09:03

Zars19 reopened this Jul 12, 2024

DefTruth reviewed Jul 19, 2024

View reviewed changes

tensorrt_llm/builder.py Outdated Show resolved Hide resolved

Support SDXL and its distributed inference

57fe0c0

Zars19 force-pushed the main branch from eea8b7f to 57fe0c0 Compare July 22, 2024 12:07

QiJune mentioned this pull request Jul 23, 2024

Question: Can Context FMHA be used to implement Transformer in a vision encoder for multimodal models? #2001

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SDXL and its distributed inference #1514

Support SDXL and its distributed inference #1514

Zars19 commented Apr 28, 2024

juney-nvidia commented Jun 3, 2024

nv-guomingz commented Jun 3, 2024

nv-guomingz commented Jun 3, 2024

Zars19 commented Jun 12, 2024 •

edited

Loading

nv-guomingz commented Jul 10, 2024

Zars19 commented Jul 12, 2024 •

edited

Loading

lmxyy commented Aug 22, 2024 •

edited

Loading

Zars19 commented Aug 23, 2024

Support SDXL and its distributed inference #1514

Are you sure you want to change the base?

Support SDXL and its distributed inference #1514

Conversation

Zars19 commented Apr 28, 2024

juney-nvidia commented Jun 3, 2024

nv-guomingz commented Jun 3, 2024

nv-guomingz commented Jun 3, 2024

Zars19 commented Jun 12, 2024 • edited Loading

nv-guomingz commented Jul 10, 2024

Zars19 commented Jul 12, 2024 • edited Loading

lmxyy commented Aug 22, 2024 • edited Loading

Zars19 commented Aug 23, 2024

Zars19 commented Jun 12, 2024 •

edited

Loading

Zars19 commented Jul 12, 2024 •

edited

Loading

lmxyy commented Aug 22, 2024 •

edited

Loading