On Principled Entropy Exploration in Policy Optimization

Authors: Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluations demonstrate that the proposed method significantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks.
Researcher Affiliation Collaboration Jincheng Mei1 , Chenjun Xiao1 , Ruitong Huang2 , Dale Schuurmans1 and Martin M uller1 1University of Alberta 2Borealis AI Lab
Pseudocode Yes Algorithm 1 The ECPO algorithm
Open Source Code No The paper states 'All of these algorithms are implemented in rlkit.2 https://github.com/vitchyr/rlkit'. This refers to a third-party library used by the authors, not their own source code for the specific methodology (ECPO/ECAC) described in the paper. There is no explicit statement or link indicating that their implementation of ECPO/ECAC is open-sourced.
Open Datasets Yes We further test ECPO on five algorithmic tasks from the Open AI gym [Brockman et al., 2016] library... Second, we test ECAC on continuous-control benchmarks from the Open AI Gym, utilizing the Mu Jo Co environment [Brockman et al., 2016; Todorov et al., 2012]
Dataset Splits No The paper refers to 'evaluation rollouts' and uses well-known benchmark tasks, but it does not provide specific details on how the datasets were split into training, validation, and test sets (e.g., exact percentages or sample counts). The usage of standard splits for these benchmarks is not explicitly stated.
Hardware Specification No The paper does not specify any hardware components used for experiments, such as specific GPU/CPU models, memory configurations, or computing cluster details.
Software Dependencies No The paper mentions 'rlkit' as the implementation platform ('All of these algorithms are implemented in rlkit.2 https://github.com/vitchyr/rlkit'), but it does not provide specific version numbers for rlkit or any other crucial software libraries (e.g., Python, TensorFlow, PyTorch, CUDA, etc.).
Experiment Setup No The paper mentions 'Implementation details are provided in the appendix' (Section 5), but these details are not present in the main body of the paper. Without access to the appendix, the main text does not provide specific hyperparameters (e.g., learning rate, batch size) or detailed training configurations.