reproducibilityindex.ai

Diagnosing Bottlenecks in Deep Q-learning Algorithms

Authors: Justin Fu, Aviral Kumar, Matthew Soh, Sergey Levine

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we aim to experimentally investigate potential issues in Q-learning, by means of a unit testing framework where we can utilize oracles to disentangle sources of error. Specifically, we investigate questions related to function approximation, sampling error and nonstationarity, and where available, verify if trends found in oracle settings hold true with deep RL methods.
Researcher Affiliation	Academia	Justin Fu * 1 Aviral Kumar * 1 Matthew Soh 1 Sergey Levine 1 1UC Berkeley. Correspondence to: Justin Fu <justinjfu@eecs.berkeley.edu>, Aviral Kumar <aviralk@berkeley.edu>.
Pseudocode	Yes	Algorithm 1 Exact-FQI (Section 4), Algorithm 2 Sampled-FQI (Section 4), Algorithm 3 Replay-FQI (Section 4).
Open Source Code	No	No explicit statement providing access to source code for the methodology described in this paper, nor a direct link to a code repository, was found.
Open Datasets	Yes	We selected 8 tabular domains, each with different qualitative attributes, including: gridworlds of varying sizes and observations, blind Cliffwalk (Schaul et al., 2015), discretized Pendulum and Mountain Car based on Open AI Gym (Plappert et al., 2018), and a sparsely connected graph.
Dataset Splits	No	The paper mentions measuring 'validation errors' and 'on-policy validation loss' (Section 5.1, Figure 3) but does not provide specific details on dataset splits (e.g., percentages or sample counts) for training, validation, and testing.
Hardware Specification	No	The acknowledgements state: 'We thank Google, NVIDIA, and Amazon for providing computational resources.' (Acknowledgements) However, no specific hardware details such as GPU/CPU models, processor types, or memory amounts are provided for running the experiments.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') are mentioned in the paper.
Experiment Setup	Yes	Throughout our experiments, we use 2-layer Re LU networks, denoted by a tuple (N, N) where N represents the number of units in a layer.