Batch Reinforcement Learning Through Continuation Method
Authors: Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present results on a variety of control tasks, game environments and a recommendation task to empirically demonstrate the efficacy of our proposed method. ... 4 EXPERIMENTS We evaluate our method with several baselines on continuous control tasks. |
| Researcher Affiliation | Collaboration | 1University of Michigan 2Google AI |
| Pseudocode | Yes | Algorithm 1 Soft Policy Iteration through Continuation Method |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We use a publicly available dataset Movie Lens-1M, a popular benchmark for recommender system. ... We focus on eight games and generate the datasets as discussed in Fujimoto et al. [13]. |
| Dataset Splits | No | The paper mentions using a 'training dataset D' and a 'held-out test set' for evaluation but does not specify explicit dataset splits (e.g., percentages or counts) for train, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | We set to large value initially and let the KL divergence term dominate the objective, thus performing behavior cloning. We record a moving average of the Q value estimation variance var(Q , 0) over 1000 updates at the end of the phase. After that, we decay the temperature gradually with λ = 0.9 every I steps. |