Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Incentivize Improvements from Strategic Agents

Authors: Yatong Chen, Jialu Wang, Yang Liu

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In benchmarks on simulated and real-world datasets, we find that classifiers trained using our method maintain the accuracy of existing approaches while inducing higher levels of improvement and less manipulation. Our empirical results show that our method outperforms existing approaches, even when some feature types are misspecified.
Researcher Affiliation	Academia	Yatong Chen EMAIL Computer Science and Engineering University of California, Santa Cruz Jialu Wang EMAIL Computer Science and Engineering University of California, Santa Cruz Yang Liu EMAIL Computer Science and Engineering University of California, Santa Cruz
Pseudocode	Yes	Algorithm 1 Best Response for Non-Linear Model Input: Non-Linear classifier h, an individual data point x Result: x I Step 1. Call LIME to get the approximated weights w of a local linear classifier for non-linear model h around the individual point x Step 2. Substitute w into Eq. (5) and Eq. (6) to get x I , respectively
Open Source Code	Yes	The details for reproducing our experimental results can be found at https://github.com/UCSC-REAL/Constructive Adaptation.
Open Datasets	Yes	We consider five datasets: toy, a synthetic dataset based on the causal DAG in Fig. 1; credit, a dataset for predicting whether an individual will default on an upcoming credit payment (Yeh & Lien, 2009); adult, a census-based dataset for predicting adult annual incomes; german, a dataset to assess credit risk in loans; and spambase, a dataset for email spam detection. The last three are from the UCI ML Repository (Dua & Graff, 2017).
Dataset Splits	Yes	We run each method with 5-fold cross-validation and report the following:
Hardware Specification	Yes	We conducted all experiments on a 3 GHz 6-Core Intel Core i5 CPU.
Software Dependencies	No	The paper does not explicitly state specific software dependencies with version numbers. It mentions the computational cost and CPU used but no software.
Experiment Setup	Yes	CA, a linear logistic regression classifier that results from solving the optimization program in Eq. (23), which is a smooth differentiable surrogate version of the objective function Eq. (7). Please refer to Appendix D.3 for a detailed derivation. Using the BFGS algorithm (Byrd et al., 1995). CA represents our approach. Effect of trade-off parameter λ. Fig. 2 shows the performance of linear classifiers for different values of λ on four real datasets. In general, we observe a trade-off between the improvement rate and deployment error: both increase as λ increases from 0.01 to 10 in all four datasets.