reproducibilityindex.ai

DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs

Authors: Aayam Kumar Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In theory, we show conditions that allow for lower-bounding the performance of DAC-MDP solutions. We also investigate the empirical behavior in a number of environments, including those with imagebased observations. Overall, the experiments demonstrate that the framework can work in practice and scale to large complex ofﬂine RL problems.
Researcher Affiliation	Academia	Aayam Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern Oregon State University Corvallis, OR 97330, USA {shrestaa, leestef, tadepall, alan.fern}@oregonstate.edu
Pseudocode	Yes	Pseudocode 1 GPU Value Iteration Kernel ... Pseudocode 2 GPU Value Iteration Function
Open Source Code	Yes	As an additional engineering contribution, this implementation will be made public.
Open Datasets	No	The paper states, 'Following recent work (Fujimoto et al., 2019), we generate datasets by ﬁrst training a DQN agent for each game.' and 'We generate three datasets of size 100k each:'. It indicates that the datasets were generated by the authors, not that they are publicly available with a link or formal citation for access.
Dataset Splits	No	The paper mentions training and testing, but does not explicitly specify a separate validation dataset or its split percentages/counts for model selection.
Hardware Specification	Yes	We consider 3 GPUs, namely, GTX 1080ti, RTX 8000, and Tesla V100, each with a CUDA core count of 3584, 4608 and 6912, respectively. The serial implementation is run on an Intel Xeon processor.
Software Dependencies	No	The paper mentions 'Adam Kingma & Ba (2015)' as the network optimizer and discusses CUDA, but does not provide specific version numbers for these or other key software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	Table 3: All Hyperparameters for DQN and BCQ [Atari] includes: Learning rate 0.0000625, Discount γ 0.99, Mini-batch size 32, Target network update frequency 8k training updates, Evaluation ϵ 0.001, Threshold τ (BCQ) 0.3.