Fast Value Tracking for Deep Reinforcement Learning

Authors: Frank Shih, Faming Liang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section compares LKTD with prominent RL algorithms such as DQN, Boot DQN (Osband et al., 2016), QR-DQN (Bellemare et al., 2017) and KOVA (Shashua & Mannor, 2020). Using an simple indoor escape environment, we demonstrate the advantages of LKTD in three aspects: (1) accuracy of Q-value estimation, (2) uncertainty quantification of Q-values, and (3) optimal policy exploration. Furthermore, by employing more complex environments like Open AI gym, we demonstrate that LKTD is capable of learning better and more stable policies for both training and testing.
Researcher Affiliation Academia Frank Shih Department of Statistics Purdue University West Lafayette, IN 47907, USA shih37@purdue.edu Faming Liang Department of Statistics Purdue University West Lafayette, IN 47907, USA fmliang@purdue.edu
Pseudocode Yes Algorithm 1: Langevinized Kalman Temporal-Difference (LKTD); Algorithm S1: Extended Kalman Temporal-Difference Algorithm (KOVA Algorithm); Algorithm S2: Prototype of the LKTD Algorithm; Algorithm S3: SGLD for RL sampling framework; Algorithm S4: SGHMC for RL sampling framework
Open Source Code No The paper mentions 'RL Baselines3 Zoo (Raffin, 2020)' and provides a link to its GitHub repository: 'https://github.com/DLR-RM/rl-baselines3-zoo'. This is a third-party framework used by the authors, but no explicit statement or link is provided for the open-sourcing of the LKTD algorithm or the authors' own implementation code.
Open Datasets Yes This section evaluates LKTD s performance on four Open AI gym challenges: Acrobot-v1, Cart Pole-v1, Lunar Lander-v2, and Mountain Car-v0, comparing it against DQN and QRDQN based on RL Baselines3 Zoo (Raffin, 2020) training framework.
Dataset Splits No The paper uses RL environments where data is generated through interaction, not fixed datasets. While it mentions training steps, batch sizes, and buffer sizes, it does not explicitly provide training/test/validation dataset splits with percentages, sample counts, or specific predefined split citations, as would be typical for static datasets.
Hardware Specification Yes In Table A3, we have recorded the average computation time required by each algorithm to execute a single parameter update, utilizing an 4-core AMD Epyc 7662 Rome processor.
Software Dependencies No The paper mentions using 'RL Baselines3 Zoo (Raffin, 2020) training framework' and 'Open AI gym (Brockman et al., 2016)' but does not provide specific version numbers for these software components or any other libraries like PyTorch or TensorFlow.
Experiment Setup Yes The detailed hyperparameter setting is listed in Table A4 and Table A5. Each experiment is duplicated 100 times, and the training progress is recorded in Figure 5 and Figure A4.