We introduce StableMaterials, a novel approach for generating photorealistic physical-based rendering (PBR) materials that integrate semi-supervised learning with Latent Diffusion Models (LDMs). Our method employs adversarial training to distill knowledge from existing large-scale image generation models, minimizing the reliance on annotated data and enhancing the diversity in generation. This distillation approach aligns the distribution of the generated materials with that of image textures from an SDXL model, enabling the generation of novel materials that are not present in the initial training dataset. Furthermore, we employ a diffusion-based refiner model to improve the visual quality of the samples and achieve high-resolution generation. Finally, we distill a latent consistency model for fast generation in just four steps and propose a new tileability technique that removes visual artifacts typically associated with fewer diffusion steps. We detail the architecture and training process of StableMaterials, the integration of semi-supervised training within existing LDM frameworks, and show the advantages of our approach. Comparative evaluations with state-of-the-art methods show the effectiveness of StableMaterials, highlighting its potential applications in computer graphics and beyond.
StableMaterials introduces an innovative approach for generating physically-based rendering (PBR) materials. By leveraging semi-supervised adversarial training, this method incorporates unannotated data and advanced refinement techniques to produce high-quality, realistic materials.
StableMaterials incorporates unannotated (non-PBR) samples into the training process, utilizing knowledge distilled from a large-scale pretrained model (SDXL). This approach allows the model to benefit from a broader range of data, enhancing its ability to generate diverse and realistic materials even with limited annotated data.
The method integrates traditional supervised loss with an unsupervised adversarial loss. This combination forces the model to generate realistic maps for unannotated samples, effectively bridging the gap between annotated and unannotated data distributions. This ensures that the generated materials maintain high realism and consistency.
StableMaterials employs a diffusion-based refinement model inspired by SDXL architecture. Techniques such as SDEdit and patched diffusion are used to enhance visual quality and achieve high-resolution generation. The model initially generates materials at a base resolution of 512x512 and refines them to higher resolutions while maintaining consistency and memory efficiency.
To speed up the generation process, StableMaterials distills a latent consistency model that reduces the number of inference steps. This model enables faster generation without compromising quality, by condensing the necessary steps to achieve the desired output in fewer stages.
The method introduces a novel features rolling technique to address tileability and minimize visual seams. This technique shifts the tensor rolling from the diffusion step to the U-Net architecture, directly shifting feature maps within each convolutional and attention layer. This approach ensures seamless tiling and reduces artifacts that could arise from fewer diffusion steps.
@article{vecchio2024stablematerials,
title={StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning},
author={Vecchio, Giuseppe},
journal={arXiv preprint arXiv:2406.09293},
year={2024}
}