reproducibilityindex.ai

Dynamic Action Repetition for Deep Reinforcement Learning

Authors: Aravind Lakshminarayanan, Sahil Sharma, Balaraman Ravindran

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that such a dynamic time scale mechanism improves the performance on relatively harder games in the Atari 2600 domain, independent of the underlying Deep Reinforcement Learning algorithm used. Experimental Setup and Results
Researcher Affiliation	Academia	Aravind S. Lakshminarayanan, Sahil Sharma, Balaraman Ravindran Indian Institute of Technology, Madras
Pseudocode	No	The paper describes various algorithms like Q-Learning, DQN, and A3C in text, but it does not include any structured pseudocode blocks or sections explicitly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code	No	The paper states 'We used the LSTM-controller and the best-performing open source implementation of A3C algorithm.', which indicates they used existing open-source code, but does not provide a link or statement that their methodology's code is open-source. The provided links are for videos of learned policies.
Open Datasets	Yes	Video game domains such as Mario (Togelius, Karakovskiy, and Baumgarten 2010), Atari 2600 (Bellemare et al. 2013) and Half Field Offensive (Hausknecht et al. 2016) have served as a test bed to measure performance of learning algorithms in AI-based game playing.
Dataset Splits	No	The paper mentions 'A training epoch consists of 250000 steps (action selections). This is followed by a testing epoch which consists of 125000 steps.' and reports scores for the 'best testing epoch', but does not explicitly describe a separate validation split or cross-validation setup.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions using 'LSTM-controller' and 'A3C algorithm' and refers to 'DQN' and 'A3C' implementations, but it does not provide specific version numbers for any software dependencies or libraries (e.g., TensorFlow, PyTorch, Python version, specific game emulator versions).
Experiment Setup	Yes	The values of r1, r2 (deﬁned in the previous section) are kept the same for all three games and are equal to 4 and 20 respectively. To ensure sufﬁcient exploration (given that the augmented model has double the number of actions), the exploration probability ϵ is annealed from 1 to 0.1 over 2 million steps as against 1 million steps used in DQN. Therefore, the augmented model has 1024 units in the pre-output hidden layer as compared to 512 units used in the DQN architecture (Mnih et al. 2015).