reproducibilityindex.ai

Fast Value Tracking for Deep Reinforcement Learning

Authors: Frank Shih, Faming Liang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section compares LKTD with prominent RL algorithms such as DQN, Boot DQN (Osband et al., 2016), QR-DQN (Bellemare et al., 2017) and KOVA (Shashua & Mannor, 2020). Using an simple indoor escape environment, we demonstrate the advantages of LKTD in three aspects: (1) accuracy of Q-value estimation, (2) uncertainty quantification of Q-values, and (3) optimal policy exploration. Furthermore, by employing more complex environments like Open AI gym, we demonstrate that LKTD is capable of learning better and more stable policies for both training and testing.
Researcher Affiliation	Academia	Frank Shih Department of Statistics Purdue University West Lafayette, IN 47907, USA shih37@purdue.edu Faming Liang Department of Statistics Purdue University West Lafayette, IN 47907, USA fmliang@purdue.edu
Pseudocode	Yes	Algorithm 1: Langevinized Kalman Temporal-Difference (LKTD); Algorithm S1: Extended Kalman Temporal-Difference Algorithm (KOVA Algorithm); Algorithm S2: Prototype of the LKTD Algorithm; Algorithm S3: SGLD for RL sampling framework; Algorithm S4: SGHMC for RL sampling framework
Open Source Code	No	The paper mentions 'RL Baselines3 Zoo (Raffin, 2020)' and provides a link to its GitHub repository: 'https://github.com/DLR-RM/rl-baselines3-zoo'. This is a third-party framework used by the authors, but no explicit statement or link is provided for the open-sourcing of the LKTD algorithm or the authors' own implementation code.
Open Datasets	Yes	This section evaluates LKTD s performance on four Open AI gym challenges: Acrobot-v1, Cart Pole-v1, Lunar Lander-v2, and Mountain Car-v0, comparing it against DQN and QRDQN based on RL Baselines3 Zoo (Raffin, 2020) training framework.
Dataset Splits	No	The paper uses RL environments where data is generated through interaction, not fixed datasets. While it mentions training steps, batch sizes, and buffer sizes, it does not explicitly provide training/test/validation dataset splits with percentages, sample counts, or specific predefined split citations, as would be typical for static datasets.
Hardware Specification	Yes	In Table A3, we have recorded the average computation time required by each algorithm to execute a single parameter update, utilizing an 4-core AMD Epyc 7662 Rome processor.
Software Dependencies	No	The paper mentions using 'RL Baselines3 Zoo (Raffin, 2020) training framework' and 'Open AI gym (Brockman et al., 2016)' but does not provide specific version numbers for these software components or any other libraries like PyTorch or TensorFlow.
Experiment Setup	Yes	The detailed hyperparameter setting is listed in Table A4 and Table A5. Each experiment is duplicated 100 times, and the training progress is recorded in Figure 5 and Figure A4.