Zero-Shot Assistance in Sequential Decision Problems

Authors: Sebastiaan De Peuter, Samuel Kaski

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show experimentally that our approach adapts to these agent biases, and results in higher cumulative reward for the agent than automation-based alternatives. Lastly, we show that an approach combining advice and automation outperforms advice alone at the cost of losing some safety guarantees. ... In simulation experiments we show that (1) AIAD significantly outperforms these automation-based baselines. ... We also show that (2) an assistant which infers and accounts for agent biases outperforms one that does not.
Researcher Affiliation Academia 1 Department of Computer Science, Aalto University, Espoo, Finland 2 Department of Computer Science, University of Manchester, Manchester, UK
Pseudocode No The paper describes algorithms but does not include structured pseudocode or algorithm blocks. It refers to appendices for detailed explanations: 'We give a short overview of the algorithm here, and refer the reader to the appendices for a detailed explanation.'
Open Source Code Yes Our code is available from https://github.com/Aalto PML/Zero-Shot-Assistance-in-Sequential-Decision-Problems.
Open Datasets No The paper uses custom simulation environments ('day trip design problem' and 'inventory management problem') and generates data by sampling parameters for each run, rather than using a predefined, publicly available dataset. For example: 'For every run we sampled new agent model parameters θ, ω and a new set of POIs.'
Dataset Splits No The paper describes running simulations a certain number of times ('We ran this experiment 75 times.', 'We ran this experiment 20 times'), but does not specify traditional training/validation/test splits of a dataset. Instead, it describes evaluation points for agent interactions within these simulated runs: 'The PL and IRL baselines are evaluated at 0, 5, 10 15, 20, 25 and 30 interactions, while the other methods are evaluated on a continuous range of N from 1 to 30.'
Hardware Specification No The paper acknowledges 'the computational resources provided by the Aalto Science-IT project' but does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup Yes The paper provides specific details regarding the experimental setup for its simulations, such as the number of runs ('We ran this experiment 75 times.'), episode lengths ('episodes of 50 time steps'), how agent parameters were sampled ('For every run we sampled new agent model parameters θ, ω and a new set of POIs'), and specific interaction points for evaluation ('The PL and IRL baselines are evaluated at 0, 5, 10 15, 20, 25 and 30 interactions').