Programmatic Reward Design by Example
Authors: Weichao Zhou, Wenchao Li9233-9241
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that programmatic reward functions learned using this framework can significantly outperform those learned using existing reward learning algorithms, and enable RL agents to achieve state-of-the-art performance on highly complex tasks. |
| Researcher Affiliation | Academia | Weichao Zhou and Wenchao Li Boston University {zwc662,wenchao}@bu.edu |
| Pseudocode | Yes | Algorithm 1: Generative Adversarial PRDBE |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. It only references a third-party environment: 'https://github.com/maximecb/gym-minigrid'. |
| Open Datasets | Yes | We select from the Mini Grid environments (Chevalier-Boisvert, Willems, and Pal 2018) three challenging RL tasks... Chevalier-Boisvert, M.; Willems, L.; and Pal, S. 2018. Minimalistic Gridworld Environment for Open AI Gym. https: //github.com/maximecb/gym-minigrid. Accessed: 2022-0425. |
| Dataset Splits | No | The paper describes using a certain number of 'example trajectories' (e.g., '10 example trajectories demonstrated in a Door Key-8x8 environment') for learning, but does not specify explicit train/validation/test dataset splits with percentages or counts for reproducing data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions several algorithms and environments (e.g., PPO, AGAC, Mini Grid) but does not provide specific version numbers for any ancillary software dependencies or libraries (e.g., Python, PyTorch, Gym versions) needed for reproduction. |
| Experiment Setup | No | The paper does not provide specific experimental setup details such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs, optimizer settings) or detailed training configurations in the main text. |