Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
Authors: Michael Janner, Igor Mordatch, Sergey Levine
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation is designed to study the viability of γ-models as a replacement of conventional single-step models for long-horizon state prediction and model-based control. Figure 5 shows learning curves for all methods. We find that γ-MVE converges faster than prior algorithms, twice as quickly as SAC, while retaining their asymptotic performance. |
| Researcher Affiliation | Collaboration | Michael Janner1 Igor Mordatch2 Sergey Levine12 1UC Berkeley 2Google Brain {janner, svlevine}@eecs.berkeley.edu imordatch@google.com |
| Pseudocode | Yes | Algorithm 1 γ-model training without density evaluation. Algorithm 2 γ-model training with density evaluation. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository. |
| Open Datasets | Yes | We investigate γ-model predictions as a function of discount in continuous-action versions of two benchmark environments suitable for visualization: acrobot (Sutton, 1996) and pendulum. The training data come from a mixture distribution over all intermediate policies of 200 epochs of optimization with soft-actor critic (SAC; Haarnoja et al. 2018). |
| Dataset Splits | No | The paper mentions 'training data' but does not specify explicit training, validation, or test dataset splits with percentages, counts, or a detailed splitting methodology. |
| Hardware Specification | No | The acknowledgements section mentions 'computational resource donations from Amazon', but no specific hardware details such as GPU/CPU models, processors, or cloud instance specifications are provided for running the experiments. |
| Software Dependencies | Yes | All models were implemented using PyTorch 1.5.0 and CUDA 10.2. |
| Experiment Setup | Yes | Further implementation details, including all hyperparameter settings and network architectures, are included in Appendix C. (Appendix C contains: MLPs with 2 hidden layers of 256 units and ReLU activation functions, learning rate of 3e-4, batch size of 256, target update coefficient of 0.005, discount factor γ = 0.99 for all Mujoco experiments, etc.) |