Environment Design for Biased Decision Makers
Authors: Guanghui Yu, Chien-Ju Ho
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct both simulations and real human-subject experiments with workers recruited from Amazon Mechanical Turk to evaluate our proposed algorithms. |
| Researcher Affiliation | Academia | Guanghui Yu and Chien-Ju Ho Washington University in St. Louis {guanghuiyu, chienju.ho}@wustl.edu |
| Pseudocode | Yes | Algorithm 1 Gradient-based Algorithm for Solving (4) |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper describes generating 1,000 environments for simulations and recruiting workers for human-subject experiments. It does not use or provide access information for a pre-existing, publicly available dataset in the context of training or evaluation. |
| Dataset Splits | No | The paper does not specify distinct training, validation, and test dataset splits with percentages or sample counts for reproducing experiments. It describes generating simulation environments and conducting human-subject experiments directly. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running its simulations or conducting human-subject experiments. |
| Software Dependencies | No | The paper describes its algorithms and mathematical formulations but does not list any specific software dependencies, libraries, or frameworks with version numbers that would be needed to replicate the experiments. |
| Experiment Setup | Yes | In our simulations, we create a grid world of size 10 x 10. There are four actions representing the direction agent can move to: {up, down, left, right}. After each action, the agent moves to the nearby grid associated with the action with 70% chance and to a random nearby grid with 30% chance. The initial state is in the middle of the grid world. The time horizon T is set to be 20. We initialize the principal s reward function values to be uniformly drawn from the range [0, 0.5]. We then randomly choose a 2 x 2 block as global optimal region and add 0.5 to the reward values within this block. Similarly, we randomly draw 1 to 3 local optimal regions (2 x 2 blocks) by setting their reward lower than global optimal but higher than its neighbors. We randomly generate 1,000 environments following the above procedure and report the average results. on these 1,000 environments. [...] We apply the algorithms in Section 3, with the soft-max parameter β = 3. |