Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-model Ensemble Conformal Prediction in Dynamic Environments
Authors: Erfan Hajihashemi, Yanning Shen
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, the performance of the proposed method, SAMOCP, is assessed within the context of classification tasks. We conduct a comprehensive comparison with recently proposed methods in online conformal prediction for dynamic environments within classification tasks. |
| Researcher Affiliation | Academia | Erfan Hajihashemi Department of Electrical Engineering & Computer Science University of California, Irvine EMAIL Yanning Shen Department of Electrical Engineering & Computer Science University of California, Irvine EMAIL |
| Pseudocode | Yes | Algorithm 1 Multi-model Ensemble Online Conformal Prediction (MOCP) |
| Open Source Code | Yes | Codes are available at hyperrefhttps://github.com/erfanhajihashemi/Multi-model-Ensemble-Conformal-Predictionin-Dynamic-Environments. |
| Open Datasets | Yes | Dataset: We utilize corrupted versions of CIFAR-10 and CIFAR-100 [Krizhevsky, 2009], known as CIFAR-10C and CIFAR-100C [Hendrycks and Dietterich, 2019]. ... All real datasets are downloaded from the Zenodo repository. |
| Dataset Splits | No | The paper mentions data being split into 'batches of 500 data samples each' and uses a 'calibration dataset' which is an 'evolving calibration dataset' in the online setting, but does not provide explicit train/validation/test dataset splits with percentages or sample counts for reproducibility in a typical static ML setup. |
| Hardware Specification | Yes | All experiments were performed on a workstation with NVIDIA RTX A4000 GPU. |
| Software Dependencies | No | The paper mentions various learning models (e.g., Res Net-50, Goog Le Net) and notes that codes are available on GitHub, but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow version, specific library versions). |
| Experiment Setup | Yes | For every experiment conducted on the synthetic dataset, CIFAR-10C, CIFAR-100C, parameters Ο΅, Ο, and Ξ· were selected through grid search, with values of 0.9, 140, and 0.05, respectively. The hyperparameters ΞΎ and kreg are set to 0.02 and 5 for CIFAR-100C, and 0.1 and 1 for Cifar-10C, respectively. |