Dynamic Action Repetition for Deep Reinforcement Learning

Authors: Aravind Lakshminarayanan, Sahil Sharma, Balaraman Ravindran

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that such a dynamic time scale mechanism improves the performance on relatively harder games in the Atari 2600 domain, independent of the underlying Deep Reinforcement Learning algorithm used. Experimental Setup and Results
Researcher Affiliation Academia Aravind S. Lakshminarayanan, Sahil Sharma, Balaraman Ravindran Indian Institute of Technology, Madras
Pseudocode No The paper describes various algorithms like Q-Learning, DQN, and A3C in text, but it does not include any structured pseudocode blocks or sections explicitly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code No The paper states 'We used the LSTM-controller and the best-performing open source implementation of A3C algorithm.', which indicates they used existing open-source code, but does not provide a link or statement that *their* methodology's code is open-source. The provided links are for videos of learned policies.
Open Datasets Yes Video game domains such as Mario (Togelius, Karakovskiy, and Baumgarten 2010), Atari 2600 (Bellemare et al. 2013) and Half Field Offensive (Hausknecht et al. 2016) have served as a test bed to measure performance of learning algorithms in AI-based game playing.
Dataset Splits No The paper mentions 'A training epoch consists of 250000 steps (action selections). This is followed by a testing epoch which consists of 125000 steps.' and reports scores for the 'best testing epoch', but does not explicitly describe a separate validation split or cross-validation setup.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions using 'LSTM-controller' and 'A3C algorithm' and refers to 'DQN' and 'A3C' implementations, but it does not provide specific version numbers for any software dependencies or libraries (e.g., TensorFlow, PyTorch, Python version, specific game emulator versions).
Experiment Setup Yes The values of r1, r2 (defined in the previous section) are kept the same for all three games and are equal to 4 and 20 respectively. To ensure sufficient exploration (given that the augmented model has double the number of actions), the exploration probability ϵ is annealed from 1 to 0.1 over 2 million steps as against 1 million steps used in DQN. Therefore, the augmented model has 1024 units in the pre-output hidden layer as compared to 512 units used in the DQN architecture (Mnih et al. 2015).