Diagnosing Bottlenecks in Deep Q-learning Algorithms
Authors: Justin Fu, Aviral Kumar, Matthew Soh, Sergey Levine
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we aim to experimentally investigate potential issues in Q-learning, by means of a unit testing framework where we can utilize oracles to disentangle sources of error. Specifically, we investigate questions related to function approximation, sampling error and nonstationarity, and where available, verify if trends found in oracle settings hold true with deep RL methods. |
| Researcher Affiliation | Academia | Justin Fu * 1 Aviral Kumar * 1 Matthew Soh 1 Sergey Levine 1 1UC Berkeley. Correspondence to: Justin Fu <justinjfu@eecs.berkeley.edu>, Aviral Kumar <aviralk@berkeley.edu>. |
| Pseudocode | Yes | Algorithm 1 Exact-FQI (Section 4), Algorithm 2 Sampled-FQI (Section 4), Algorithm 3 Replay-FQI (Section 4). |
| Open Source Code | No | No explicit statement providing access to source code for the methodology described in this paper, nor a direct link to a code repository, was found. |
| Open Datasets | Yes | We selected 8 tabular domains, each with different qualitative attributes, including: gridworlds of varying sizes and observations, blind Cliffwalk (Schaul et al., 2015), discretized Pendulum and Mountain Car based on Open AI Gym (Plappert et al., 2018), and a sparsely connected graph. |
| Dataset Splits | No | The paper mentions measuring 'validation errors' and 'on-policy validation loss' (Section 5.1, Figure 3) but does not provide specific details on dataset splits (e.g., percentages or sample counts) for training, validation, and testing. |
| Hardware Specification | No | The acknowledgements state: 'We thank Google, NVIDIA, and Amazon for providing computational resources.' (Acknowledgements) However, no specific hardware details such as GPU/CPU models, processor types, or memory amounts are provided for running the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') are mentioned in the paper. |
| Experiment Setup | Yes | Throughout our experiments, we use 2-layer Re LU networks, denoted by a tuple (N, N) where N represents the number of units in a layer. |