Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Shengqi Liu1, Yuhao Cheng1, Zhuo Chen1, Xingyu Ren1, Wenhan Zhu2,

Lincheng Li3, Mengxiao Bi3, Xiaokang Yang1, Yichao Yan1†

1MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University,

2 Xueshen AI, 3 NetEase Fuxi AI Lab

(Corresponding author)
Teaser

SewingLDM can generate complex sewing pattern designs under the condition of texts, garment sketches, and body shapes, demonstrating detailed control ability. The generated garments can be seamlessly integrated into the CG pipeline for simulation and animation, achieving vivid and photo-realistic rendering results.

Abstract

Generating sewing patterns in garment design is receiving increasing attention due to its CG-friendly and flexible-editing nature. Previous sewing pattern generation methods have been able to produce exquisite clothing, but struggle to design complex garments with detailed control. To address these issues, we propose SewingLDM, a multi-modal generative model that generates sewing patterns controlled by text prompts, body shapes, and garment sketches. Initially, we extend the original vector of sewing patterns into a more comprehensive representation to cover more intricate details and then compress them into a compact latent space. To learn the sewing pattern distribution in the latent space, we design a two-step training strategy to inject the multi-modal conditions, i.e., body shapes, text prompts, and garment sketches, into a diffusion model, ensuring the generated garments are body-suited and detail-controlled. Comprehensive qualitative and quantitative experiments show the effectiveness of our proposed method, significantly surpassing previous approaches in terms of complex garment design and various body adaptability.

Video

Method Overview

Pulpit rock

Multimodal latent diffusion model. To balance multi-modal conditions and facilitate future conditional scalability, we design a two-step training strategy: 1) In the first step, we train the latent diffusion model only under the text guidance; 2) In the second step, we embed the knowledge of body shapes and garment sketches into the diffusion model for detailed control and body-suited garment generation. We fuse the features of sketches and body shapes and normalize them into the diffusion model, with fine-tuning minimal parameters of the diffusion model. The trained network parameters are depicted in orange, while the frozen parameters are shown in purple. The output latent is then quantized into a designed latent space and serves as the input of the decoder to yield all edge lines. Edge lines connect from beginning to end to form panels, placed on the corresponding body regions. Finally, we can get suited garments through the modern CG pipeline.

Results

Animation Results

Comparison with Baselines

More Generation Results

BibTeX

@misc{liu2024multimodallatentdiffusionmodel,
      title={Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation}, 
      author={Shengqi Liu and Yuhao Cheng and Zhuo Chen and Xingyu Ren and Wenhan Zhu and Lincheng Li and Mengxiao Bi and Xiaokang Yang and Yichao Yan},
      year={2024},
      eprint={2412.14453},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.14453}, 
  }