Addressing the Long-term Impact of ML Decisions via Policy Regret
Authors: David Lindner, Hoda Heidari, Andreas Krause
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare our algorithm with several baselines and find that it consistently outperforms them, in particular for long time horizons. In this section, we empirically investigate the effectiveness of our noise-handling approach on several datasets. |
| Researcher Affiliation | Academia | 1ETH Zurich 2Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 The Single-Peaked Optimism (SPO) algorithm. |
| Open Source Code | Yes | Code to reproduce all of our experiments can be found at https://github.com/david-lindner/single-peaked-bandits. |
| Open Datasets | Yes | Motivated by our initial example of a budget planner in Section 1, we simulate a credit lending scenario based on the FICO credit scoring dataset from 2003 [Reserve, 2007]. |
| Dataset Splits | No | The paper uses datasets like FICO and synthetic data but does not explicitly provide details about training, validation, or test splits (e.g., percentages, sample counts, or specific splitting methodology) for their experiments. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names like 'Python 3.8' or 'PyTorch 1.9') are explicitly mentioned in the paper. |
| Experiment Setup | Yes | We consider three datasets: (1) a set of synthetic reward functions, (2) a simulation of a user interacting with a recommender system, and (3) a dataset constructed from the FICO credit scoring data. ... We assume the user s inherent preferences stay constant, but the novelty factor decays when showing an item more often. ... The reward is fi(0) = 0 for never showing an item, and subsequent rewards are defined as fi(t) = fi(t 1) + n γt c (fi(t) v). |