Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Homogeneous Algorithms Can Reduce Competition in Personalized Pricing

Authors: Nathanael Jo, Ashia C Wilson, Kathleen Creel, Manish Raghavan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our theoretical results in a stylized empirical study where two firms compete using personalized pricing algorithms. Our results demonstrate a new mechanism for achieving collusion through correlation, which allows us to analyze its legal implications. (...) We also conduct empirical analyses demonstrating that firms choosing different model classes or choosing to share training data may lead to correlated models in equilibrium.
Researcher Affiliation	Academia	Nathanael Jo Massachusetts Institute of Technology EMAIL Kathleen A. Creel Northeastern University EMAIL Ashia Wilson Massachusetts Institute of Technology EMAIL Manish Raghavan Massachusetts Institute of Technology EMAIL
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. Theoretical concepts and models are described in paragraph form or mathematical equations, but not in a structured, code-like format.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Yes, the code and data are available and reproducible, see the supplementary material. Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [Yes] Justification: Our code is well-documented via a Read Me file and attached in the repository.
Open Datasets	Yes	We use ACSIncome data [17], which contains US Census data from 2018. (...) Both firms train and test on Census data in California. The test set is 30% of the data (n = 58, 700) and is fixed across both firms. We randomly split half of the remaining 70% as the training set for Firm 1, and the other half for Firm 2, each having 35% of the entire data to train (n = 68, 482).
Dataset Splits	Yes	Both firms train and test on Census data in California. The test set is 30% of the data (n = 58, 700) and is fixed across both firms. We randomly split half of the remaining 70% as the training set for Firm 1, and the other half for Firm 2, each having 35% of the entire data to train (n = 68, 482). We repeat the training data splits over 15 random seeds.
Hardware Specification	Yes	All experiments (including the ones outlined in the following Section) were run using a Apple Silicon M2 chip with 16GB. They only require CPUs and are not computationally expensive any modern computer can easily run these experiments.
Software Dependencies	No	All unspecified hyperparameters use the default values set by scikit-learn. All experiments (including the ones outlined in the following Section) were run using a Apple Silicon M2 chip with 16GB.
Experiment Setup	Yes	We chose the following model hyperparameters to simulate a higher performance for random forests compared to logistic regression: Model Hyperparameters Logistic Regression ℓ1-penalty, saga solver Random Forest # trees = 9 min # samples in each leaf = 7 weight: 1.2x for negative class