reproducibilityindex.ai

On Principled Entropy Exploration in Policy Optimization

Authors: Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluations demonstrate that the proposed method signiﬁcantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks.
Researcher Affiliation	Collaboration	Jincheng Mei1 , Chenjun Xiao1 , Ruitong Huang2 , Dale Schuurmans1 and Martin M uller1 1University of Alberta 2Borealis AI Lab
Pseudocode	Yes	Algorithm 1 The ECPO algorithm
Open Source Code	No	The paper states 'All of these algorithms are implemented in rlkit.2 https://github.com/vitchyr/rlkit'. This refers to a third-party library used by the authors, not their own source code for the specific methodology (ECPO/ECAC) described in the paper. There is no explicit statement or link indicating that their implementation of ECPO/ECAC is open-sourced.
Open Datasets	Yes	We further test ECPO on ﬁve algorithmic tasks from the Open AI gym [Brockman et al., 2016] library... Second, we test ECAC on continuous-control benchmarks from the Open AI Gym, utilizing the Mu Jo Co environment [Brockman et al., 2016; Todorov et al., 2012]
Dataset Splits	No	The paper refers to 'evaluation rollouts' and uses well-known benchmark tasks, but it does not provide specific details on how the datasets were split into training, validation, and test sets (e.g., exact percentages or sample counts). The usage of standard splits for these benchmarks is not explicitly stated.
Hardware Specification	No	The paper does not specify any hardware components used for experiments, such as specific GPU/CPU models, memory configurations, or computing cluster details.
Software Dependencies	No	The paper mentions 'rlkit' as the implementation platform ('All of these algorithms are implemented in rlkit.2 https://github.com/vitchyr/rlkit'), but it does not provide specific version numbers for rlkit or any other crucial software libraries (e.g., Python, TensorFlow, PyTorch, CUDA, etc.).
Experiment Setup	No	The paper mentions 'Implementation details are provided in the appendix' (Section 5), but these details are not present in the main body of the paper. Without access to the appendix, the main text does not provide specific hyperparameters (e.g., learning rate, batch size) or detailed training configurations.