Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Addressing the Long-term Impact of ML Decisions via Policy Regret
Authors: David Lindner, Hoda Heidari, Andreas Krause
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare our algorithm with several baselines and find that it consistently outperforms them, in particular for long time horizons. In this section, we empirically investigate the effectiveness of our noise-handling approach on several datasets. |
| Researcher Affiliation | Academia | 1ETH Zurich 2Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 The Single-Peaked Optimism (SPO) algorithm. |
| Open Source Code | Yes | Code to reproduce all of our experiments can be found at https://github.com/david-lindner/single-peaked-bandits. |
| Open Datasets | Yes | Motivated by our initial example of a budget planner in Section 1, we simulate a credit lending scenario based on the FICO credit scoring dataset from 2003 [Reserve, 2007]. |
| Dataset Splits | No | The paper uses datasets like FICO and synthetic data but does not explicitly provide details about training, validation, or test splits (e.g., percentages, sample counts, or specific splitting methodology) for their experiments. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names like 'Python 3.8' or 'PyTorch 1.9') are explicitly mentioned in the paper. |
| Experiment Setup | Yes | We consider three datasets: (1) a set of synthetic reward functions, (2) a simulation of a user interacting with a recommender system, and (3) a dataset constructed from the FICO credit scoring data. ... We assume the user s inherent preferences stay constant, but the novelty factor decays when showing an item more often. ... The reward is fi(0) = 0 for never showing an item, and subsequent rewards are defined as fi(t) = fi(t 1) + n γt c (fi(t) v). |