Admissible Policy Teaching through Reward Design
Authors: Kiarash Banihashem, Adish Singla, Jiarui Gan, Goran Radanovic6037-6045
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design a local search algorithm to solve the surrogate problem and showcase its utility using simulation-based experiments. |
| Researcher Affiliation | Academia | Max Planck Institute for Software Systems {kbanihas, adishs, jrgan, gradanovic}@mpi-sws.org |
| Pseudocode | Yes | Algorithm 1. CONSTRAIN&OPTIMIZE |
| Open Source Code | No | For details regarding the experiments and code, please refer to the full version of our paper (Banihashem et al. 2022). |
| Open Datasets | No | As an experimental testbed, we consider three simple navigation environments, shown in Figure 2. |
| Dataset Splits | No | The paper describes custom environments and mentions parameters, but does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running experiments. |
| Software Dependencies | No | The paper does not specify software names with version numbers. |
| Experiment Setup | Yes | By default, we set the parameters γ = 0.9, λ = 1.0 and ϵ = 0.1. |