Batch Reinforcement Learning Through Continuation Method

Authors: Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present results on a variety of control tasks, game environments and a recommendation task to empirically demonstrate the efficacy of our proposed method. ... 4 EXPERIMENTS We evaluate our method with several baselines on continuous control tasks.
Researcher Affiliation Collaboration 1University of Michigan 2Google AI
Pseudocode Yes Algorithm 1 Soft Policy Iteration through Continuation Method
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We use a publicly available dataset Movie Lens-1M, a popular benchmark for recommender system. ... We focus on eight games and generate the datasets as discussed in Fujimoto et al. [13].
Dataset Splits No The paper mentions using a 'training dataset D' and a 'held-out test set' for evaluation but does not specify explicit dataset splits (e.g., percentages or counts) for train, validation, or test sets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes We set to large value initially and let the KL divergence term dominate the objective, thus performing behavior cloning. We record a moving average of the Q value estimation variance var(Q , 0) over 1000 updates at the end of the phase. After that, we decay the temperature gradually with λ = 0.9 every I steps.