On Principled Entropy Exploration in Policy Optimization
Authors: Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations demonstrate that the proposed method significantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks. |
| Researcher Affiliation | Collaboration | Jincheng Mei1 , Chenjun Xiao1 , Ruitong Huang2 , Dale Schuurmans1 and Martin M uller1 1University of Alberta 2Borealis AI Lab |
| Pseudocode | Yes | Algorithm 1 The ECPO algorithm |
| Open Source Code | No | The paper states 'All of these algorithms are implemented in rlkit.2 https://github.com/vitchyr/rlkit'. This refers to a third-party library used by the authors, not their own source code for the specific methodology (ECPO/ECAC) described in the paper. There is no explicit statement or link indicating that their implementation of ECPO/ECAC is open-sourced. |
| Open Datasets | Yes | We further test ECPO on five algorithmic tasks from the Open AI gym [Brockman et al., 2016] library... Second, we test ECAC on continuous-control benchmarks from the Open AI Gym, utilizing the Mu Jo Co environment [Brockman et al., 2016; Todorov et al., 2012] |
| Dataset Splits | No | The paper refers to 'evaluation rollouts' and uses well-known benchmark tasks, but it does not provide specific details on how the datasets were split into training, validation, and test sets (e.g., exact percentages or sample counts). The usage of standard splits for these benchmarks is not explicitly stated. |
| Hardware Specification | No | The paper does not specify any hardware components used for experiments, such as specific GPU/CPU models, memory configurations, or computing cluster details. |
| Software Dependencies | No | The paper mentions 'rlkit' as the implementation platform ('All of these algorithms are implemented in rlkit.2 https://github.com/vitchyr/rlkit'), but it does not provide specific version numbers for rlkit or any other crucial software libraries (e.g., Python, TensorFlow, PyTorch, CUDA, etc.). |
| Experiment Setup | No | The paper mentions 'Implementation details are provided in the appendix' (Section 5), but these details are not present in the main body of the paper. Without access to the appendix, the main text does not provide specific hyperparameters (e.g., learning rate, batch size) or detailed training configurations. |