Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Path Consistency Learning in Tsallis Entropy Regularized MDPs
Authors: Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions. |
| Researcher Affiliation | Industry | 1Google Brain 2Deep Mind. |
| Pseudocode | Yes | A pseudo-code of our sparse PCL algorithm can be found in Algorithm 1 in the Appendix A. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | We demonstrate the effectiveness of the sparse PCL algorithm by comparing its performance with that of the soft PCL algorithm on a number of RL environments available in the Open AI Gym environment (Brockman et al., 2016). |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits with percentages or counts. It mentions training curves and Monte Carlo trials but no detailed data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions 'Open AI Gym environment' and 'recurrent neural network' but does not list specific software components with version numbers (e.g., Python, TensorFlow/PyTorch versions). |
| Experiment Setup | Yes | For each task and each PCL algorithm, we perform a hyper-parameter search to ο¬nd the optimal regularization weight... The functions V , Β΅, Ξ», and in the consistency equations are parameterized with a recurrent neural network with multiple heads... We discretize each continuous action with either one of the following grids: {β1, 0, 1} and {β1, 0.5, 0, 0.5, 1}. |