Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Tight Lower Bounds and Improved Convergence in Performative Prediction
Authors: Pedram Khorsandi, Rushil Gupta, Mehrnaz Mofakhami, Simon Lacoste-Julien, Gauthier Gidel
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper, for the first time, extends the Repeated Risk Minimization (RRM) algorithm class by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers that converges to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that our new algorithm class can surpass the lower bound for standard RRM, thus breaking the prior lower bound, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our scheme. 5 Finally, we present the first lower bound techniques for iterative retraining schemes and apply it to both Perdomo et al. [2020] and our modified framework to establish theoretical lower bounds for ARM, detailing the maximum potential improvement in convergence rates achievable through the use of past datasets. 7 Experiments We conduct experiments in two semi-synthetic environments to evaluate whether aggregating past snapshots improves convergence to the performatively stable point. |
| Researcher Affiliation | Academia | Pedram Khorsandi Mila, Quebec AI Institute Université de Montréal Rushil Gupta Mila, Quebec AI Institute Université de Montréal Mehrnaz Mofakhami Mila, Quebec AI Institute Université de Montréal Simon Lacoste-Julien Mila, Quebec AI Institute Université de Montréal Canada CIFAR AI Chair Gauthier Gidel Mila, Quebec AI Institute Université de Montréal Canada CIFAR AI Chair Correspondence to Pedram Khorsandi: <EMAIL> |
| Pseudocode | No | The paper describes algorithms such as Repeated Risk Minimization (RRM) and Affine Risk Minimizers (ARM) using mathematical equations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | Figure 1: An example showing that using older snapshots (purple) speeds up convergence to the stable point (orange star) compared to only the latest snapshot (red). The implementation is provided in the code. 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Code will be provided in the supplementary material. |
| Open Datasets | Yes | Our data comes from Kaggle s Give Me Some Credit dataset4, which includes features x R11 and labels y {0, 1}, where y = 1 indicates a defaulting applicant. 4Give me Some Credit Dataset, 2011: https://www.kaggle.com/c/Give Me Some Credit The simulations use the publicly available Uber and Lyft dataset from Boston, MA on Kaggle7. 7Uber and Lyft dataset from Boston, MA, 2019: https://www.kaggle.com/datasets/brllrb/ uber-and-lyft-dataset-boston-ma |
| Dataset Splits | No | In Section 7.1, the paper states: "We partition the features into two sets: strategic and non-strategic." This describes a partition of features within the dataset, not a train/test/validation split of the dataset itself for model training and evaluation. No other specific dataset splits are mentioned for reproducing experiments. |
| Hardware Specification | Yes | The RRM procedure is carried out for a maximum of 50 iterations with a learning rate of 3e-4 and Adam optimizer run over A100-40G GPUs. Each player takes an action in this game by setting their price for the riders across 11 different locations in the same city of Boston, MA. The price set by one firm directly influences the demand observed by both firms. The demand constitutes the data distribution and at each time step, a total of 25 demand samples are sampled for a firm i, and the optimal response is found by minimizing Equation 53 for a maximum of 40 retraining steps on CPU. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" in the experimental setup (Section I and J). However, it does not specify version numbers for Adam or any other software libraries (e.g., PyTorch, TensorFlow, scikit-learn, etc.) that would be necessary for reproducible software dependencies. |
| Experiment Setup | Yes | The RRM procedure is carried out for a maximum of 50 iterations with a learning rate of 3e-4 and Adam optimizer run over A100-40G GPUs. Each player observes a revenue of z T i xi. Thus, the loss function that each player i seeks to minimize in the RRM framework can be described as: xt+1 i = arg min xi Ezi Dt z T i xi + λ 2 xi 2 (53) where λ is a hyperparameter for the regularization term (= 70 for our experiments). For any player i, each element of xi is clipped to be between the range of [ 30, 30] and the initial price x0 i is sampled randomly from a uniform distribution on [0, 1]. |