Path Consistency Learning in Tsallis Entropy Regularized MDPs

Authors: Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.
Researcher Affiliation Industry 1Google Brain 2Deep Mind.
Pseudocode Yes A pseudo-code of our sparse PCL algorithm can be found in Algorithm 1 in the Appendix A.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available.
Open Datasets Yes We demonstrate the effectiveness of the sparse PCL algorithm by comparing its performance with that of the soft PCL algorithm on a number of RL environments available in the Open AI Gym environment (Brockman et al., 2016).
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits with percentages or counts. It mentions training curves and Monte Carlo trials but no detailed data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions 'Open AI Gym environment' and 'recurrent neural network' but does not list specific software components with version numbers (e.g., Python, TensorFlow/PyTorch versions).
Experiment Setup Yes For each task and each PCL algorithm, we perform a hyper-parameter search to find the optimal regularization weight... The functions V , µ, λ, and in the consistency equations are parameterized with a recurrent neural network with multiple heads... We discretize each continuous action with either one of the following grids: {−1, 0, 1} and {−1, 0.5, 0, 0.5, 1}.