Path Consistency Learning in Tsallis Entropy Regularized MDPs
Authors: Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions. |
| Researcher Affiliation | Industry | 1Google Brain 2Deep Mind. |
| Pseudocode | Yes | A pseudo-code of our sparse PCL algorithm can be found in Algorithm 1 in the Appendix A. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | We demonstrate the effectiveness of the sparse PCL algorithm by comparing its performance with that of the soft PCL algorithm on a number of RL environments available in the Open AI Gym environment (Brockman et al., 2016). |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits with percentages or counts. It mentions training curves and Monte Carlo trials but no detailed data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions 'Open AI Gym environment' and 'recurrent neural network' but does not list specific software components with version numbers (e.g., Python, TensorFlow/PyTorch versions). |
| Experiment Setup | Yes | For each task and each PCL algorithm, we perform a hyper-parameter search to find the optimal regularization weight... The functions V , µ, λ, and in the consistency equations are parameterized with a recurrent neural network with multiple heads... We discretize each continuous action with either one of the following grids: {−1, 0, 1} and {−1, 0.5, 0, 0.5, 1}. |