Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Maximizing the Value of Predictions in Control: Accuracy Is Not Enough

Authors: Yiheng Lin, Christopher Yeh, Zaiwei Chen, Adam Wierman

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide examples (e.g., Example 3.3) that illustrate why analyzing prediction accuracy is insufficient improving prediction accuracy may not always improve prediction power... The simulation code for all examples (Examples 3.3, 3.4, and A.2) can be found at https://github.com/yihenglin97/Prediction-Power.
Researcher Affiliation	Academia	Yiheng Lin California Institute of Technology Pasadena, CA, USA EMAIL Christopher Yeh California Institute of Technology Pasadena, CA, USA EMAIL Zaiwei Chen Purdue University West Lafayette, IN, USA EMAIL Adam Wierman California Institute of Technology Pasadena, CA, USA EMAIL
Pseudocode	Yes	Algorithm 1 Prediction Power Evaluation Algorithm 2 Expected Conditional Covariance Estimator (ECCE)
Open Source Code	Yes	The simulation code for all examples (Examples 3.3, 3.4, and A.2) can be found at https://github.com/yihenglin97/Prediction-Power. We submit the simulation code in the supplementary material.
Open Datasets	No	Suppose the disturbance is sampled Wt i.i.d. N(0, I) at every time step t... We train a linear regressor to predict each entry of Wt from Vt(θ) (or Vt(I)) over a train dataset with 64000 independent samples.
Dataset Splits	Yes	Algorithm 2 Expected Conditional Covariance Estimator (ECCE)... 1: Split the dataset D to Dtrain, Dval, and Dtest. In the simulation, we train linear regressors to predict Wt and Wt+1 with the history It(1) or It(2) for each time step t < T = 100 over a train dataset of size 160000. Then, we plot the MSE time curve on a test dataset of size 40000.
Hardware Specification	Yes	Running this experiment takes about 50 seconds on Apple Mac mini with Apple M1 CPU.
Software Dependencies	No	We do not use any specific assets beyond standard open-source scientific Python packages such as numpy and matplotlib for running experiments.
Experiment Setup	Yes	We instantiate Example 3.3 with the following parameters: A = [[1, 0.1], [0, 1]], B = [[0], [0.1]], R = (1), and θ := [[1, 0.99], [0, 0.141]]... we sample the true disturbance Wt i.i.d. N(0, I) and fix the coefficient ρ = 0.5. The online policy optimization starts with the initial policy parameter Υ0 = 0. When implementing M-GAPS in both scenarios, we use the decaying learning rate sequence ηt = (1 + t/1000) -0.5. We simulate 30 random trajectories with T = 80000.