MatSynth: A Modern PBR Materials Dataset

*Adobe Research
CVPR 2024
Teaser image

Samples of materials from the dataset. Our dataset contains 4,069 high-quality, 4K, tileable materials with permissive licences. Each material is augmented, rendered and supplemented by metadata containing its origin, tags, categories, method of creation, and more.

Abstract

We introduce MatSynth, a dataset of 4,000+ CC0 ultra-high resolution PBR materials. Materials are crucial components of virtual relightable assets, defining the interaction of light at the surface of geometries. Given their importance, significant research effort was dedicated to their representation, creation and acquisition. However, in the past 6 years, most research in material acquisiton or generation relied either on the same unique dataset, or on company-owned huge library of procedural materials. With this dataset we propose a significantly larger, more diverse, and higher resolution set of materials than previously publicly available. We carefully discuss the data collection process and demonstrate the benefits of this dataset on material acquisition and generation applications. The complete data further contains metadata with each material's origin, license, category, tags, creation method and, when available, descriptions and physical size, as well as 3M+ renderings of the augmented materials, in 1K, under various environment lightings.

The Dataset

MatSynth is a new large-scale dataset comprising over 4,000 ultra-high resolution Physically Based Rendering (PBR) materials, all released under permissive licensing.
All materials in the dataset are represented by a common set of maps (Basecolor, Diffuse, Normal, Height, Roughness, Metallic, Specular and, when useful, Opacity), modelling both the reflectance and mesostructure of the material.
Each material in the dataset comes with rich metadata, including information on its origin, licensing details, category, tags, creation method, and, when available, descriptions and physical size. This comprehensive metadata facilitates precise material selection and usage, catering to the specific needs of users.

Dataset Structure

The MatSynth dataset is divided into two splits: the test split, containing 89 materials, and the train split, consisting of 3,980 materials. To enhance accessibility and ease of navigation, each split is further organized into separate folders for each distinct category present in the dataset (Blends, Ceramic, Concrete, Fabric, Ground, Leather, Marble, Metal, Misc, Plastic, Plaster, Stone, Terracotta, Wood).

Dataset Creation

The MatSynth dataset is designed to support modern, learning-based techniques for a variety of material-related tasks including, but not limited to, material acquisition, material generation and synthetic data generation e.g. for retrieval or segmentation.

Source Data

The MatSynth dataset is the result of an extensively collection of data from multiple online sources operating under the CC0 and CC-BY licensing framework. This collection strategy allows to capture a broad spectrum of materials, from commonly used ones to more niche or specialized variants while guaranteeing that the data can be used for a variety of usecases.
Materials under CC0 license were collected from AmbientCG, CGBookCase, PolyHeaven, ShateTexture, and TextureCan. The dataset also includes limited set of materials from the artist Julio Sillet, distributed under CC-BY license.

BibTeX


@inproceedings{vecchio2023matsynth,
  title={MatSynth: A Modern PBR Materials Dataset},
  author={Vecchio, Giuseppe and Deschaintre, Valentin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}