Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Look-Ahead Reasoning on Learning Platforms

Authors: Haiqing Zhu, Tijana Zrnic, Celestine Mendler-Dünner

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our theoretical findings empirically. We adapt the credit-scoring simulator from [Perdomo et al., 2020] that models how a lending institution classifies loan applicants by creditworthiness.1 We first focus on the results related to level-k reasoning from Section 3 and then offer empirical insights into the trade-offs of collective reasoning from Section 4 and Section 5. [...] In Figure 1 we report the speed of convergence by presenting the iterate gap θt+1 θt 2 against the number of iterations. We choose ϵ = 0.5. First, we can see that under all three mixture weights the gap tends to zero and the dynamics converge. As the fraction of higher levels of thinking increases, the speed of convergence increases, which confirms our theoretical finding in Theorem 3. We also verified empirically that the dynamics converge to a unique equilibrium independent of α.
Researcher Affiliation	Academia	Haiqing Zhu Australian National University EMAIL Tijana Zrnic Stanford University EMAIL Celestine Mendler-Dünner ELLIS Institute Tübingen MPI for Intelligent Systems, Tübingen Tübingen AI Center EMAIL
Pseudocode	No	The paper describes methods using mathematical formulations and definitions but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	For the implementation of the simulation, see https://github.com/haiqingzhu543/Look-Ahead-Reasoningon-Learning-Platforms; for the dataset, see https://www.kaggle.com/c/Give Me Some Credit.
Open Datasets	Yes	For the implementation of the simulation, see https://github.com/haiqingzhu543/Look-Ahead-Reasoningon-Learning-Platforms; for the dataset, see https://www.kaggle.com/c/Give Me Some Credit.
Dataset Splits	No	The error bars indicate one standard deviations over 10 different train-test splits.
Hardware Specification	No	All simulations in this paper are executable with a standard personal laptop.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers within the text of the main paper or appendices. A GitHub link is provided, but the requirements are not detailed in the paper itself.
Experiment Setup	Yes	Assume the learner fits a logistic regression classifier θ using cross-entropy loss. The data has 10 features and we assume agents can manipulate the subset S = { remaining credit card balance , open credit lines , number of real estate loans }. Given some ϵ > 0, the utility of the agents is given by uϵ((x, y), θ) = θ, x − 1/2ϵ x0 − x 2 2 , where x0 is their feature value under D0. The best response of the agents is given by x S = x S − ϵθS, where S indexes the strategic features. Note that this corresponds to the strategy for agents reasoning at level-1; indeed, assuming other agents are non-strategic implies strategizing against a fixed model. It is not hard to see that the resulting distribution map D1(θ) is ϵ-sensitive. Under this model we simulate the repeated retraining dynamics for mixed populations of level-1 and level-2 thinkers of varying proportion. In Figure 1 we report the speed of convergence by presenting the iterate gap θt+1 θt 2 against the number of iterations. We choose ϵ = 0.5. [...] We again consider a learner that trains a logistic regression classifier using cross-entropy loss. For the population, we consider strategies that consist of misreporting a single feature. We choose this to be either age or number of dependents ; [...] We approximate this optimal strategy using gradient descent with learning rate 0.01 and 250 epochs. [...] The target model θtarget we used in the simulation was fixed to be the model produced by the strategy with parameters (η = 0.5, α = 0.3).