Addressing the Long-term Impact of ML Decisions via Policy Regret

Authors: David Lindner, Hoda Heidari, Andreas Krause

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically compare our algorithm with several baselines and find that it consistently outperforms them, in particular for long time horizons. In this section, we empirically investigate the effectiveness of our noise-handling approach on several datasets.
Researcher Affiliation Academia 1ETH Zurich 2Carnegie Mellon University
Pseudocode Yes Algorithm 1 The Single-Peaked Optimism (SPO) algorithm.
Open Source Code Yes Code to reproduce all of our experiments can be found at https://github.com/david-lindner/single-peaked-bandits.
Open Datasets Yes Motivated by our initial example of a budget planner in Section 1, we simulate a credit lending scenario based on the FICO credit scoring dataset from 2003 [Reserve, 2007].
Dataset Splits No The paper uses datasets like FICO and synthetic data but does not explicitly provide details about training, validation, or test splits (e.g., percentages, sample counts, or specific splitting methodology) for their experiments.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names like 'Python 3.8' or 'PyTorch 1.9') are explicitly mentioned in the paper.
Experiment Setup Yes We consider three datasets: (1) a set of synthetic reward functions, (2) a simulation of a user interacting with a recommender system, and (3) a dataset constructed from the FICO credit scoring data. ... We assume the user s inherent preferences stay constant, but the novelty factor decays when showing an item more often. ... The reward is fi(0) = 0 for never showing an item, and subsequent rewards are defined as fi(t) = fi(t 1) + n γt c (fi(t) v).