We introduce MatSynth, a dataset of 4,000+ CC0 ultra-high resolution PBR materials. Materials are crucial components of virtual relightable assets, defining the interaction of light at the surface of geometries. Given their importance, significant research effort was dedicated to their representation, creation and acquisition. However, in the past 6 years, most research in material acquisiton or generation relied either on the same unique dataset, or on company-owned huge library of procedural materials. With this dataset we propose a significantly larger, more diverse, and higher resolution set of materials than previously publicly available. We carefully discuss the data collection process and demonstrate the benefits of this dataset on material acquisition and generation applications. The complete data further contains metadata with each material's origin, license, category, tags, creation method and, when available, descriptions and physical size, as well as 3M+ renderings of the augmented materials, in 1K, under various environment lightings.
MatSynth is a new large-scale dataset comprising over 4,000 ultra-high resolution Physically Based
Rendering (PBR) materials, all released under permissive licensing.
All materials in the dataset are represented by a common set of maps (Basecolor, Diffuse, Normal, Height,
Roughness, Metallic, Specular and, when useful, Opacity), modelling both the reflectance and mesostructure
of the material.
Each material in the dataset comes with rich metadata, including information on its origin, licensing
details, category, tags, creation method, and, when available, descriptions and physical size. This
comprehensive metadata facilitates precise material selection and usage, catering to the specific needs of
users.
The MatSynth dataset is divided into two splits: the test split, containing 89 materials, and the train split, consisting of 3,980 materials. To enhance accessibility and ease of navigation, each split is further organized into separate folders for each distinct category present in the dataset (Blends, Ceramic, Concrete, Fabric, Ground, Leather, Marble, Metal, Misc, Plastic, Plaster, Stone, Terracotta, Wood).
The MatSynth dataset is designed to support modern, learning-based techniques for a variety of material-related tasks including, but not limited to, material acquisition, material generation and synthetic data generation e.g. for retrieval or segmentation.
The MatSynth dataset is the result of an extensively collection of data from multiple online sources
operating under the CC0 and CC-BY licensing framework. This collection strategy allows to capture a broad
spectrum of materials, from commonly used ones to more niche or specialized variants while guaranteeing
that
the data can be used for a variety of usecases.
Materials under CC0 license were collected from AmbientCG, CGBookCase, PolyHeaven, ShateTexture, and
TextureCan. The dataset also includes limited set of materials from the artist Julio Sillet, distributed
under CC-BY license.
@inproceedings{vecchio2024matsynth,
author = {Vecchio, Giuseppe and Deschaintre, Valentin},
title = {MatSynth: A Modern PBR Materials Dataset},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {22109-22118}
}