reproducibilityindex.ai

Batch Reinforcement Learning Through Continuation Method

Authors: Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present results on a variety of control tasks, game environments and a recommendation task to empirically demonstrate the efﬁcacy of our proposed method. ... 4 EXPERIMENTS We evaluate our method with several baselines on continuous control tasks.
Researcher Affiliation	Collaboration	1University of Michigan 2Google AI
Pseudocode	Yes	Algorithm 1 Soft Policy Iteration through Continuation Method
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We use a publicly available dataset Movie Lens-1M, a popular benchmark for recommender system. ... We focus on eight games and generate the datasets as discussed in Fujimoto et al. [13].
Dataset Splits	No	The paper mentions using a 'training dataset D' and a 'held-out test set' for evaluation but does not specify explicit dataset splits (e.g., percentages or counts) for train, validation, or test sets.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	We set to large value initially and let the KL divergence term dominate the objective, thus performing behavior cloning. We record a moving average of the Q value estimation variance var(Q , 0) over 1000 updates at the end of the phase. After that, we decay the temperature gradually with λ = 0.9 every I steps.