MatFuse: Controllable Material Generation with Diffusion Models

Giuseppe Vecchio*, Renato Sortino*, Simone Palazzo, Concetto Spampinato
University of Catania
CVPR 2024

*Indicates Equal Contribution

Animated renderings of materials generated using MatFuse.

Abstract

MatFuse teaser

Sample scenes textured using materials generated with MatFuse. For each of the three scenes we show the materials used and the final rendering.

Creating high-quality materials in computer graphics is a challenging and time-consuming task, which requires great expertise. To simplify this process, we introduce MatFuse, a unified approach that harnesses the generative power of diffusion models for creation and editing of 3D materials. Our method integrates multiple sources of conditioning, including color palettes, sketches, text, and pictures, enhancing creative possibilities and granting fine-grained control over material synthesis. Additionally, MatFuse enables map-level material editing capabilities through latent manpulation by means of a multi-encoder compression model which learns a disentangled latent representation for each map. We demonstrate the effectiveness of MatFuse under multiple conditioning settings and explore the potential of material editing. Finally, we assess the quality of the generated materials both quantitatively in terms of CLIP-IQA and FID scores and qualitatively by conducting a user study. Source code for training MatFuse and supplementary materials are publicly available at https://gvecchio.com/matfuse.

Method

MatFuse architecture

MatFuse is a unified, multi-conditional method leveraging the generation capabilities of diffusion models to tackle the task of high-quality material synthesis as a set of SVBRDF maps.
It relies on a multi-encoder extension to the auto-encoder, using 4 different encoders, for learning a map-specifc latent spaces to enable editing capabilities.

Multi-Conditional generation

Generation

MatFuse integrates multiple sources of conditioning, including color palettes, sketches, text, and pictures, enhancing creative possibilities and granting fine-grained control over material synthesis.

Material Editing

Material editing

The use of a multi-encoder architecture allows the model to learn a disentangled latent representation of each material map, hardly achievable using a single encoder, by encoding each map separately. This latent representation allows us to manipulate specific parts of the latent space, knowing which material property they encode, thus enabling an unprecedented level of material editing capabilities. We propose a novel ''volumetric inpainting'' approach by jointly masking portions of the noise tensor on both the spatial and channel dimensions.

BibTeX

        
@inproceedings{vecchio2023matfuse,
  title={MatFuse: Controllable Material Generation with Diffusion Models},
  author={Vecchio, Giuseppe and Sortino, Renato and Palazzo, Simone and Spampinato, Concetto},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}