Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Semantic uncertainty intervals for disentangled latent spaces
Authors: Swami Sankaranarayanan, Anastasios Angelopoulos, Stephen Bates, Yaniv Romano, Phillip Isola
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Experiments3.1 Dataset descriptions3.2 Experimental setup3.3 Findings |
| Researcher Affiliation | Academia | 1MIT 2University of California, Berkeley 3Technion Israel Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Quantile GAN encoder training |
| Open Source Code | No | The code will be released in the near future. |
| Open Datasets | Yes | FFHQ We use the Style GAN framework pretrained using the Flickr-Faces-HQ (FFHQ) dataset [25]. FFHQ is a publicly available dataset consisting of 70,000 high-quality images at 1024 1024 resolution. ... Celeb A-HQ We use the Celeb A-HQ dataset [23]... CLEVR dataset [22]. |
| Dataset Splits | Yes | We generate 100k samples per model and generate a random 80-10-10 split for training, calibration and validation. |
| Hardware Specification | No | The paper's self-assessment in section 3d explicitly states: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]" |
| Software Dependencies | No | The paper mentions software like Style GAN2, ResNet-50, Ranger optimizer, VGG network, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | The hyperparameter weights (Eq 8) are set to c1 = c2 = 10.0. ... a flat learning rate of 0.001 for all our experiments. ... The risk level α and the user-specified error threshold δ are fixed to 0.1, unless specified otherwise. ... For the image super-resolution training, we augment the input dataset by using different levels of downsampled inputs, i.e., we take the raw input and apply a random downsampling factor from {1, 4, 8, 16, 32} and resize it to the original dimensions. |