How Should an Agent Practice?

Authors: Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh5454-5461

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the method on a simple grid world, and evaluate it in two games in which the practice environment differs from match: Pong with practice against a wall without an opponent, and Pac Man with practice in a maze without ghosts. The results show gains from learning in practice in addition to match periods over learning in matches only.
Researcher Affiliation Academia University of Michigan {rjana, rickl, vveeriah, baveja}@umich.edu, honglak@eecs.umich.edu
Pseudocode Yes Algorithm 1 Learning Practice Rewards
Open Source Code No The paper states: 'The learning agent uses the open-source implementation of the A2C algorithm (Mnih et al. 2016) from Open AI (Dhariwal et al. 2017) for the two games.' referring to a third-party library, but does not explicitly state that the code for *their* method is open-source or provide a link to it.
Open Datasets Yes The two domains used for our evaluation are Pong and Pac Man. The extrinsic reward provided to the agent during match is the change in game score as is standard in work on Atari games.
Dataset Splits No The paper describes 'practice' and 'match' periods for learning and evaluation within the game environments, but does not provide specific train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper mentions using 'OpenAI Baselines' for the A2C algorithm but does not specify a version number for this or any other software dependency.
Experiment Setup Yes Input: step-size parameters αm, αp and β. Each episode in both practice and match is of length between 45 and 50, sampled uniformly. The agent undergoes 3 practice episodes before every match episode. The agent practices in this modified practice environment for 3000 time steps after every match.