Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials which could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space-optimization methods, and carefully validate our diffusion process design choices.
Overview of the ControlMat architecture. During training, the PBR maps are compressed into the latent representation $z$ using the encoder $\mathcal{E}$. Noise is then added to $z$ and the denoising is carried out by a U-Net model. The denoising process can be globally conditioned with the CLIP embedding of the prompt (text or image) and/or locally conditioned using the intermediate representation of a target photograph extracted by a ControlNet network. After $n$ denoising steps the new denoised latent vector $\hat{z}$ is projected back to pixel space using the decoder $\mathcal{D}$.
@article{vecchio2024controlmat,
author = {Vecchio, Giuseppe and Martin, Rosalie and Roullier, Arthur and Kaiser, Adrien and Rouffet, Romain and Deschaintre, Valentin and Boubekeur, Tamy},
title = {ControlMat: A Controlled Generative Approach to Material Capture},
year = {2024},
issue_date = {October 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {43},
number = {5},
issn = {0730-0301},
url = {https://doi.org/10.1145/3688830},
doi = {10.1145/3688830},
journal = {ACM Trans. Graph.},
month = {sep},
articleno = {164},
numpages = {17},
keywords = {Material appearance, capture, generative models}
}