Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Incentivize Improvements from Strategic Agents

Authors: Yatong Chen, Jialu Wang, Yang Liu

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In benchmarks on simulated and real-world datasets, we find that classifiers trained using our method maintain the accuracy of existing approaches while inducing higher levels of improvement and less manipulation. Our empirical results show that our method outperforms existing approaches, even when some feature types are misspecified.
Researcher Affiliation Academia Yatong Chen EMAIL Computer Science and Engineering University of California, Santa Cruz Jialu Wang EMAIL Computer Science and Engineering University of California, Santa Cruz Yang Liu EMAIL Computer Science and Engineering University of California, Santa Cruz
Pseudocode Yes Algorithm 1 Best Response for Non-Linear Model Input: Non-Linear classifier h, an individual data point x Result: x I Step 1. Call LIME to get the approximated weights w of a local linear classifier for non-linear model h around the individual point x Step 2. Substitute w into Eq. (5) and Eq. (6) to get x I , respectively
Open Source Code Yes The details for reproducing our experimental results can be found at https://github.com/UCSC-REAL/Constructive Adaptation.
Open Datasets Yes We consider five datasets: toy, a synthetic dataset based on the causal DAG in Fig. 1; credit, a dataset for predicting whether an individual will default on an upcoming credit payment (Yeh & Lien, 2009); adult, a census-based dataset for predicting adult annual incomes; german, a dataset to assess credit risk in loans; and spambase, a dataset for email spam detection. The last three are from the UCI ML Repository (Dua & Graff, 2017).
Dataset Splits Yes We run each method with 5-fold cross-validation and report the following:
Hardware Specification Yes We conducted all experiments on a 3 GHz 6-Core Intel Core i5 CPU.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers. It mentions the computational cost and CPU used but no software.
Experiment Setup Yes CA, a linear logistic regression classifier that results from solving the optimization program in Eq. (23), which is a smooth differentiable surrogate version of the objective function Eq. (7). Please refer to Appendix D.3 for a detailed derivation. Using the BFGS algorithm (Byrd et al., 1995). CA represents our approach. Effect of trade-off parameter λ. Fig. 2 shows the performance of linear classifiers for different values of λ on four real datasets. In general, we observe a trade-off between the improvement rate and deployment error: both increase as λ increases from 0.01 to 10 in all four datasets.