Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning to Incentivize Improvements from Strategic Agents
Authors: Yatong Chen, Jialu Wang, Yang Liu
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In benchmarks on simulated and real-world datasets, we find that classifiers trained using our method maintain the accuracy of existing approaches while inducing higher levels of improvement and less manipulation. Our empirical results show that our method outperforms existing approaches, even when some feature types are misspecified. |
| Researcher Affiliation | Academia | Yatong Chen EMAIL Computer Science and Engineering University of California, Santa Cruz Jialu Wang EMAIL Computer Science and Engineering University of California, Santa Cruz Yang Liu EMAIL Computer Science and Engineering University of California, Santa Cruz |
| Pseudocode | Yes | Algorithm 1 Best Response for Non-Linear Model Input: Non-Linear classifier h, an individual data point x Result: x I Step 1. Call LIME to get the approximated weights w of a local linear classifier for non-linear model h around the individual point x Step 2. Substitute w into Eq. (5) and Eq. (6) to get x I , respectively |
| Open Source Code | Yes | The details for reproducing our experimental results can be found at https://github.com/UCSC-REAL/Constructive Adaptation. |
| Open Datasets | Yes | We consider five datasets: toy, a synthetic dataset based on the causal DAG in Fig. 1; credit, a dataset for predicting whether an individual will default on an upcoming credit payment (Yeh & Lien, 2009); adult, a census-based dataset for predicting adult annual incomes; german, a dataset to assess credit risk in loans; and spambase, a dataset for email spam detection. The last three are from the UCI ML Repository (Dua & Graff, 2017). |
| Dataset Splits | Yes | We run each method with 5-fold cross-validation and report the following: |
| Hardware Specification | Yes | We conducted all experiments on a 3 GHz 6-Core Intel Core i5 CPU. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies with version numbers. It mentions the computational cost and CPU used but no software. |
| Experiment Setup | Yes | CA, a linear logistic regression classifier that results from solving the optimization program in Eq. (23), which is a smooth differentiable surrogate version of the objective function Eq. (7). Please refer to Appendix D.3 for a detailed derivation. Using the BFGS algorithm (Byrd et al., 1995). CA represents our approach. Effect of trade-off parameter λ. Fig. 2 shows the performance of linear classifiers for different values of λ on four real datasets. In general, we observe a trade-off between the improvement rate and deployment error: both increase as λ increases from 0.01 to 10 in all four datasets. |