reproducibilityindex.ai

How Should an Agent Practice?

Authors: Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh5454-5461

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the method on a simple grid world, and evaluate it in two games in which the practice environment differs from match: Pong with practice against a wall without an opponent, and Pac Man with practice in a maze without ghosts. The results show gains from learning in practice in addition to match periods over learning in matches only.
Researcher Affiliation	Academia	University of Michigan {rjana, rickl, vveeriah, baveja}@umich.edu, honglak@eecs.umich.edu
Pseudocode	Yes	Algorithm 1 Learning Practice Rewards
Open Source Code	No	The paper states: 'The learning agent uses the open-source implementation of the A2C algorithm (Mnih et al. 2016) from Open AI (Dhariwal et al. 2017) for the two games.' referring to a third-party library, but does not explicitly state that the code for their method is open-source or provide a link to it.
Open Datasets	Yes	The two domains used for our evaluation are Pong and Pac Man. The extrinsic reward provided to the agent during match is the change in game score as is standard in work on Atari games.
Dataset Splits	No	The paper describes 'practice' and 'match' periods for learning and evaluation within the game environments, but does not provide specific train/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies	No	The paper mentions using 'OpenAI Baselines' for the A2C algorithm but does not specify a version number for this or any other software dependency.
Experiment Setup	Yes	Input: step-size parameters αm, αp and β. Each episode in both practice and match is of length between 45 and 50, sampled uniformly. The agent undergoes 3 practice episodes before every match episode. The agent practices in this modiﬁed practice environment for 3000 time steps after every match.