Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Abstract

Generating sewing patterns in garment design is receiving increasing attention due to its CG-friendly and flexible-editing nature. Previous sewing pattern generation methods have been able to produce exquisite clothing, but struggle to design complex garments with detailed control. To address these issues, we propose SewingLDM, a multi-modal generative model that generates sewing patterns controlled by text prompts, body shapes, and garment sketches. Initially, we extend the original vector of sewing patterns into a more comprehensive representation to cover more intricate details and then compress them into a compact latent space. To learn the sewing pattern distribution in the latent space, we design a two-step training strategy to inject the multi-modal conditions, i.e., body shapes, text prompts, and garment sketches, into a diffusion model, ensuring the generated garments are body-suited and detail-controlled. Comprehensive qualitative and quantitative experiments show the effectiveness of our proposed method, significantly surpassing previous approaches in terms of complex garment design and various body adaptability.

Video

Method Overview

Multimodal latent diffusion model. To balance multi-modal conditions and facilitate future conditional scalability, we design a two-step training strategy: 1) In the first step, we train the latent diffusion model only under the text guidance; 2) In the second step, we embed the knowledge of body shapes and garment sketches into the diffusion model for detailed control and body-suited garment generation. We fuse the features of sketches and body shapes and normalize them into the diffusion model, with fine-tuning minimal parameters of the diffusion model. The trained network parameters are depicted in orange, while the frozen parameters are shown in purple. The output latent is then quantized into a designed latent space and serves as the input of the decoder to yield all edge lines. Edge lines connect from beginning to end to form panels, placed on the corresponding body regions. Finally, we can get suited garments through the modern CG pipeline.

Results

Animation Results

Comparison with Baselines

More Generation Results

BibTeX

@article{liu2024multimodallatentdiffusionmodel,
    title={Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation}, 
    author={Shengqi Liu and Yuhao Cheng and Zhuo Chen and Xingyu Ren and Wenhan Zhu and Lincheng Li and Mengxiao Bi and Xiaokang Yang and Yichao Yan},
    journal = {International Conference on Computer Vision (ICCV)},
    year={2025}
  }