reproducibilityindex.ai

Programmatic Reward Design by Example

Authors: Weichao Zhou, Wenchao Li9233-9241

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that programmatic reward functions learned using this framework can signiﬁcantly outperform those learned using existing reward learning algorithms, and enable RL agents to achieve state-of-the-art performance on highly complex tasks.
Researcher Affiliation	Academia	Weichao Zhou and Wenchao Li Boston University {zwc662,wenchao}@bu.edu
Pseudocode	Yes	Algorithm 1: Generative Adversarial PRDBE
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. It only references a third-party environment: 'https://github.com/maximecb/gym-minigrid'.
Open Datasets	Yes	We select from the Mini Grid environments (Chevalier-Boisvert, Willems, and Pal 2018) three challenging RL tasks... Chevalier-Boisvert, M.; Willems, L.; and Pal, S. 2018. Minimalistic Gridworld Environment for Open AI Gym. https: //github.com/maximecb/gym-minigrid. Accessed: 2022-0425.
Dataset Splits	No	The paper describes using a certain number of 'example trajectories' (e.g., '10 example trajectories demonstrated in a Door Key-8x8 environment') for learning, but does not specify explicit train/validation/test dataset splits with percentages or counts for reproducing data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions several algorithms and environments (e.g., PPO, AGAC, Mini Grid) but does not provide specific version numbers for any ancillary software dependencies or libraries (e.g., Python, PyTorch, Gym versions) needed for reproduction.
Experiment Setup	No	The paper does not provide specific experimental setup details such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs, optimizer settings) or detailed training configurations in the main text.