Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Contextual Online Pricing with (Biased) Offline Data

Authors: Yixuan Zhang, Ruihao Zhu, Qiaomin Xie

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct numerical experiments on synthetic data to assess our algorithms, leaving experiments on real data for future work. Specifically, we evaluate CO3 (or GCO3 when d2 > 1) against four baselines: 1) UCB: a pure online UCB policy that ignores the offline data, 2) UCB-Offline: the UCB policy of [7, 31], which forms its single confidence ellipsoid from the combined offline and online data, 3) TS: a pure online Thompson-sampling policy and 4) TS-Offline: Thompson-sampling with a prior fitted to the offline data. We randomly generate two online models: 1) a scalar price elasticity case with d2 = 1 and 2) a general case with d2 = 5. In both cases the offline data is drawn from a market with exact bias Vtrue = Θ(T 5/16) and dispersion λmin(ˆΣ) = Θ(T). We compare CO3/GCO3 against four baselines under two bias-bound settings: a tight bound V = 1.1 Vtrue and a loose bound V = 10 Vtrue. Every configuration is averaged over 20 independent trials with T = 1000 rounds; shaded bands indicate 2-sigma error bars. Figures 2(a) (b) reveal several trends.
Researcher Affiliation	Academia	Yixuan Zhang Department of Industrial & Systems Engineering University of Wisconsin-Madison EMAIL Ruihao Zhu SC Johnson College of Business Cornell University EMAIL Qiaomin Xie Department of Industrial & Systems Engineering University of Wisconsin-Madison EMAIL
Pseudocode	Yes	Algorithm 1 CO3 Algorithm Algorithm 2 GCO3 Algorithm Algorithm 3 RCO3 Algorithm
Open Source Code	Yes	5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes]
Open Datasets	No	In this section, we conduct numerical experiments on synthetic data to assess our algorithms, leaving experiments on real data for future work. Specifically, we evaluate CO3 (or GCO3 when d2 > 1) against four baselines... We randomly generate two online models... and ten independent offline datasets are generated
Dataset Splits	No	In this section, we conduct numerical experiments on synthetic data to assess our algorithms, leaving experiments on real data for future work. Specifically, we evaluate CO3 (or GCO3 when d2 > 1) against four baselines... We randomly generate two online models... and ten independent offline datasets are generated
Hardware Specification	No	8. Experiments compute resources Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: This paper provides sufficient information on the computer resources needed to reproduce the experiments. All experiments can be conducted on a personal computer.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers in the provided text.
Experiment Setup	Yes	Every configuration is averaged over 20 independent trials with T = 1000 rounds; shaded bands indicate 2-sigma error bars. Figures 2(a) (b) reveal several trends. First, UCB-Offline and TSOffline rely uncritically on the biased offline data and accumulate regret faster than the pure-online baselines, illustrating the danger of ignoring distributional shift. Second, when the bias bound is tight (V = 1.1 Vtrue), CO3/GCO3 decisively outperform every baseline, in line with Theorems 1 and 3. Finally, even under a loose bound (V = 10 Vtrue), CO3/GCO3 track the performance of UCB and incur no additional regret, demonstrating the algorithms never-worse safety property. We next evaluate RCO3 in the general setting with d2 = 5. A single online model is randomly generated and fixed, and ten independent offline datasets are generated, each with dispersion λmin(ˆΣ) = Θ(T) but different exact biases V 2 true Θ(T n/5) for n = 0, . . . , 9. For every offline-online instance we run RCO3 with a test phase of length T = Θ(T 1/4) (α = 1/4) and compare it to the pure-online UCB baseline, repeating each policy 20 times. Figure 2(c) reports the mean cumulative regret at T = 5000 with a 2-sigma error bar as a function of Vtrue.