ITEM3D: Directional Texture Editing for 3D Models

Shengqi Liu1*, Zhuo Chen1*, Jingnan Gao1, Yichao Yan1, Wenhan Zhu1, Jiangjing Lyu2, Xiaokang Yang1
*Equal contribution

1MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, 2 Alibaba Group


Abstract

Texture editing is a crucial task in 3D modeling that allows users to automatically manipulate the surface materials of 3D models. However, the inherent complexity of 3D models and the ambiguous text description lead to the challenge in this task. To address this challenge, we propose ITEM3D, a Texture Editing Model designed for automatic 3D object editing according to the text Instructions. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation, and further optimizes the disentangled texture and environment map. Previous methods adopted the absolute editing direction namely score distillation sampling (SDS) as the optimization objective, which unfortunately results in the noisy appearance and text inconsistency. To solve the problem caused by the ambiguous text, we introduce a relative editing direction, an optimization objective defined by the noise difference between the source and target texts, to release the semantic ambiguity between the texts and images. Additionally, we gradually adjust the direction during optimization to further address the unexpected deviation in the texture domain. Qualitative and quantitative experiments show that our ITEM3D outperforms the state-of-the-art methods on various 3D objects. We also perform text-guided relighting to show explicit control over lighting.

Video

Method Overview

Pipeline of 3D model editing. We render the 3D model with mesh, texture, and environment map into 2D images which are then added with noise ϵ. We further separately use the target text and the gradually adjusted source text as the conditions to predict the added noise via the U-Net. The difference between the two predicted noises serve as the relative direction to guide the optimization of the materials and environment map.


Pulpit rock

Results on Real-world Dataset

"A vegetable toy tiger" "A golden sneaker"
"A Swiss bag" "A Nike sneaker"
"A Tiffany blue bag" "A pineapple-like hat"
"A cap with stars on it" "A Red Bull energy drink"
"A Mallard" "A pink porcelain piggy toy"

BibTeX

@misc{liu2024directional,
      title={Directional Texture Editing for 3D Models}, 
      author={Shengqi Liu and Zhuo Chen and Jingnan Gao and Yichao Yan and Wenhan Zhu and Jiangjing Lyu and Xiaokang Yang},
      year={2024},
      eprint={2309.14872},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}