Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2411.01738 (cs)

[Submitted on 4 Nov 2024]

Title:xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Authors:Jiarui Fang, Jinzhe Pan, Xibo Sun, Aoyu Li, Jiannan Wang

Abstract:Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However, generating high-quality content necessitates longer sequence lengths, exponentially increasing the computation required for the attention mechanism, and escalating DiTs inference latency. Parallel inference is essential for real-time DiTs deployments, but relying on a single parallel method is impractical due to poor scalability at large scales. This paper introduces xDiT, a comprehensive parallel inference engine for DiTs. After thoroughly investigating existing DiTs parallel approaches, xDiT chooses Sequence Parallel (SP) and PipeFusion, a novel Patch-level Pipeline Parallel method, as intra-image parallel strategies, alongside CFG parallel for inter-image parallelism. xDiT can flexibly combine these parallel approaches in a hybrid manner, offering a robust and scalable solution. Experimental results on two 8xL40 GPUs (PCIe) nodes interconnected by Ethernet and an 8xA100 (NVLink) node showcase xDiT's exceptional scalability across five state-of-the-art DiTs. Notably, we are the first to demonstrate DiTs scalability on Ethernet-connected GPU clusters. xDiT is available at this https URL.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.01738 [cs.DC]
	(or arXiv:2411.01738v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2411.01738

Submission history

From: Jiarui Fang [view email]
[v1] Mon, 4 Nov 2024 01:40:38 UTC (17,050 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators