Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

IntrinsiX: High-Quality PBR Generation using Image Priors

Authors: Peter Kocsis, Lukas Höllein, Matthias Niessner

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results demonstrate detailed intrinsic generation with strong generalization capabilities that outperforms existing intrinsic image decomposition methods used with generated images by a significant margin. Finally, we show a series of applications, including re-lighting, editing, and for the first time text-conditioned room-scale PBR texture generation. We summarize our method in Figure 2. ... 5 Experiments Training and Testing Details In the first stage, we train the Lo RAs separately for 2K iterations with a batch size of 10, which takes 5h on a single NVIDIA A100 (80GB) GPU. ... Table 1: Baseline comparisons. We compare the albedo quality for in-distribution (A-ID-FID) and out-of-distribution (A-OOD-FID) settings as well as perceptually with a user study (A-PQ). ... 5.2 Ablations The main technical contributions of our method are the cross-intrinsic attention (Section 3.2.1) and the rendering loss (Section 3.2.2). In the following, we highlight the importance of each component.
Researcher Affiliation	Academia	Peter Kocsis Lukas Höllein Matthias Nießner Technical University of Munich
Pseudocode	No	The paper describes the method and individual components (PBR Prior Training, PBR Prior Alignment, Cross-Intrinsic Attention, RGB Rendering Loss) in detail using prose and mathematical equations. It does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	We will release the code and the pre-trained model weights. ... Justification: We will release the training and testing codes along with our trained model weights upon acceptance.
Open Datasets	Yes	Unfortunately, existing datasets, such as Openrooms [35], Interior Verse [76] or Hypersim [49], contain either only synthetic examples of intrinsic decompositions or are limited in size. ... Dataset for albedo and normals Thanks to utilizing a pre-trained image prior, our method does not require extensive PBR datasets, which are generally not available. We collect as little as 20 synthetic examples of albedo and normal maps from the Interior Verse dataset [76]. ... For out-of-distribution (A-OOD-FID), we evaluate on the pre-processed G-Buffer renderings [47] of Obja Verse [10] (GObja Verse).
Dataset Splits	No	For in-distribution (A-ID-FID), we use all 2595 albedo images from the Interior Verse [76] test set and caption them based on the corresponding renderings with Florence-2 [66]. For each caption, we generate an albedo image, creating a total of 2595 generated albedo images. For out-of-distribution (A-OOD-FID), we evaluate on the pre-processed G-Buffer renderings [47] of Obja Verse [10] (GObja Verse). We take 1000 samples from the diverse Daily-Used" category. As before, we generate an albedo map for each of the prompts, creating a total of 1000 generated albedo images. In both cases, we calculate FID against the respective ground-truth albedos. ... Dataset for albedo and normals ... We collect as little as 20 synthetic examples of albedo and normal maps from the Interior Verse dataset [76]. ... Dataset for roughness and metallic Similarly, we collect and caption samples for roughness and metallic properties. ... we curate a large dataset of 20K roughness/metallic samples using the Interior Verse dataset [76].
Hardware Specification	Yes	Training and Testing Details In the first stage, we train the Lo RAs separately for 2K iterations with a batch size of 10, which takes 5h on a single NVIDIA A100 (80GB) GPU. In the second stage, we finetune for another 2.5K iterations with a batch size of 30 (10 aligned PBR maps), which takes 21h. We employ the Prodigy optimizer [43] in both stages. The Lo RA layers use a rank of 64, which gives a total of 224M additional parameters. For inference, we use a single NVIDIA A6000 (48GB) GPU. Sampling a single image takes around 12 seconds.
Software Dependencies	No	The paper mentions using Flux.1-dev 1 model [32] and the Prodigy optimizer [43], but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Training and Testing Details In the first stage, we train the Lo RAs separately for 2K iterations with a batch size of 10, which takes 5h on a single NVIDIA A100 (80GB) GPU. In the second stage, we finetune for another 2.5K iterations with a batch size of 30 (10 aligned PBR maps), which takes 21h. We employ the Prodigy optimizer [43] in both stages. The Lo RA layers use a rank of 64, which gives a total of 224M additional parameters. For inference, we use a single NVIDIA A6000 (48GB) GPU. Sampling a single image takes around 12 seconds. ... During the second finetuning stage, we sample 5 directional light sources in every iteration and render a separate RGB image with each of them. The final loss then becomes L=LCFM+ P5 i=1 Lrgb(ˆIi, Ii). We do not backpropagate Lrgb to the parameters θn of the normal Lo RA, as we find it stabilizes the rendering quality by avoiding ambiguities between material and geometry.