Diagnosing Bottlenecks in Deep Q-learning Algorithms

Authors: Justin Fu, Aviral Kumar, Matthew Soh, Sergey Levine

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we aim to experimentally investigate potential issues in Q-learning, by means of a unit testing framework where we can utilize oracles to disentangle sources of error. Specifically, we investigate questions related to function approximation, sampling error and nonstationarity, and where available, verify if trends found in oracle settings hold true with deep RL methods.
Researcher Affiliation Academia Justin Fu * 1 Aviral Kumar * 1 Matthew Soh 1 Sergey Levine 1 1UC Berkeley. Correspondence to: Justin Fu <justinjfu@eecs.berkeley.edu>, Aviral Kumar <aviralk@berkeley.edu>.
Pseudocode Yes Algorithm 1 Exact-FQI (Section 4), Algorithm 2 Sampled-FQI (Section 4), Algorithm 3 Replay-FQI (Section 4).
Open Source Code No No explicit statement providing access to source code for the methodology described in this paper, nor a direct link to a code repository, was found.
Open Datasets Yes We selected 8 tabular domains, each with different qualitative attributes, including: gridworlds of varying sizes and observations, blind Cliffwalk (Schaul et al., 2015), discretized Pendulum and Mountain Car based on Open AI Gym (Plappert et al., 2018), and a sparsely connected graph.
Dataset Splits No The paper mentions measuring 'validation errors' and 'on-policy validation loss' (Section 5.1, Figure 3) but does not provide specific details on dataset splits (e.g., percentages or sample counts) for training, validation, and testing.
Hardware Specification No The acknowledgements state: 'We thank Google, NVIDIA, and Amazon for providing computational resources.' (Acknowledgements) However, no specific hardware details such as GPU/CPU models, processor types, or memory amounts are provided for running the experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') are mentioned in the paper.
Experiment Setup Yes Throughout our experiments, we use 2-layer Re LU networks, denoted by a tuple (N, N) where N represents the number of units in a layer.