Environment Design for Biased Decision Makers

Authors: Guanghui Yu, Chien-Ju Ho

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct both simulations and real human-subject experiments with workers recruited from Amazon Mechanical Turk to evaluate our proposed algorithms.
Researcher Affiliation Academia Guanghui Yu and Chien-Ju Ho Washington University in St. Louis {guanghuiyu, chienju.ho}@wustl.edu
Pseudocode Yes Algorithm 1 Gradient-based Algorithm for Solving (4)
Open Source Code No The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets No The paper describes generating 1,000 environments for simulations and recruiting workers for human-subject experiments. It does not use or provide access information for a pre-existing, publicly available dataset in the context of training or evaluation.
Dataset Splits No The paper does not specify distinct training, validation, and test dataset splits with percentages or sample counts for reproducing experiments. It describes generating simulation environments and conducting human-subject experiments directly.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running its simulations or conducting human-subject experiments.
Software Dependencies No The paper describes its algorithms and mathematical formulations but does not list any specific software dependencies, libraries, or frameworks with version numbers that would be needed to replicate the experiments.
Experiment Setup Yes In our simulations, we create a grid world of size 10 x 10. There are four actions representing the direction agent can move to: {up, down, left, right}. After each action, the agent moves to the nearby grid associated with the action with 70% chance and to a random nearby grid with 30% chance. The initial state is in the middle of the grid world. The time horizon T is set to be 20. We initialize the principal s reward function values to be uniformly drawn from the range [0, 0.5]. We then randomly choose a 2 x 2 block as global optimal region and add 0.5 to the reward values within this block. Similarly, we randomly draw 1 to 3 local optimal regions (2 x 2 blocks) by setting their reward lower than global optimal but higher than its neighbors. We randomly generate 1,000 environments following the above procedure and report the average results. on these 1,000 environments. [...] We apply the algorithms in Section 3, with the soft-max parameter β = 3.