Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-Dimensional Conformal Prediction

Authors: Yam Tawachi, Bracha Laufer-Goldshtein

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results, demonstrate that our multi-dimensional framework offers superior efficiency compared to baseline methods across various settings. 1
Researcher Affiliation	Academia	Yam Tawachi & Bracha Laufer-Goldshtein School of Electrical and Computer Engineering Tel-Aviv University Tel-Aviv, Israel {yamtawachi@mail,blaufer@tauex}.tau.ac.il
Pseudocode	Yes	Algorithm 1 Multi-Score Conformal Prediction
Open Source Code	Yes	1Our code is available at: https://github.com/yamtawa/Multi-CP
Open Datasets	Yes	We test our method over three image classification datasets, with varying number of classes and difficulty levels: CIFAR100 (Krizhevsky et al., 2009), Tiny Image Net (Le & Yang, 2015), and Path MNIST (Yang et al., 2023a).
Dataset Splits	Yes	Table C.1: Datasets Details Dataset # Classes Train Validation Calibration Test Average Accuracy Tiny Image Net 200 71,500 11,000 16,500 11,000 0.58 CIFAR100 100 39,000 6,000 9,000 6,000 0.69 Path MNIST 9 69,667 10,718 16,077 10,718 0.94 20 Newsgroups 20 9,800 2,449 3,298 3,299 0.87 Image Net 1,000 1,281,184 10,000 20,000 20,000 0.71
Hardware Specification	No	The paper does not explicitly mention specific hardware details such as GPU models, CPU types, or other computing resources used for the experiments. It only refers to models like Res Net50 and Vi T.
Software Dependencies	No	The paper mentions using "Adam optimizer" and refers to "Pytorch", but does not provide specific version numbers for these or any other software dependencies. Without version numbers, reproducibility of the software environment is not guaranteed.
Experiment Setup	Yes	For the first three datasets we used Res Net50 model with pretrained weights on Image Net. Each head is a 3 layer feed-forward neural network with, Batch Norm, Re LU activation and dropout with p = 0.1. In the first stage, the full model with a single classification head was fine-tuned on each task with 20, 100 and 200 epochs for Tiny Image Net, CIFAR100 and Path MNIST, respectively. In the second stage, we freeze the backbone model and train only the classification heads for 20 epochs, using the loss defined in Eq. (13). In both stages, we use Adam optimizer with cosine annealing scheduler, momentum decay of 0.95, weight decay of 1e 5, and batch size of 16. For the computation of the RAPS score we used λ = 0.05 and κ = 5, and for the SAPS score we set ξ = 0.3.