Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
Authors: Michael Janner, Igor Mordatch, Sergey Levine
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation is designed to study the viability of γ-models as a replacement of conventional single-step models for long-horizon state prediction and model-based control. Figure 5 shows learning curves for all methods. We find that γ-MVE converges faster than prior algorithms, twice as quickly as SAC, while retaining their asymptotic performance. |
| Researcher Affiliation | Collaboration | Michael Janner1 Igor Mordatch2 Sergey Levine12 1UC Berkeley 2Google Brain EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 γ-model training without density evaluation. Algorithm 2 γ-model training with density evaluation. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository. |
| Open Datasets | Yes | We investigate γ-model predictions as a function of discount in continuous-action versions of two benchmark environments suitable for visualization: acrobot (Sutton, 1996) and pendulum. The training data come from a mixture distribution over all intermediate policies of 200 epochs of optimization with soft-actor critic (SAC; Haarnoja et al. 2018). |
| Dataset Splits | No | The paper mentions 'training data' but does not specify explicit training, validation, or test dataset splits with percentages, counts, or a detailed splitting methodology. |
| Hardware Specification | No | The acknowledgements section mentions 'computational resource donations from Amazon', but no specific hardware details such as GPU/CPU models, processors, or cloud instance specifications are provided for running the experiments. |
| Software Dependencies | Yes | All models were implemented using PyTorch 1.5.0 and CUDA 10.2. |
| Experiment Setup | Yes | Further implementation details, including all hyperparameter settings and network architectures, are included in Appendix C. (Appendix C contains: MLPs with 2 hidden layers of 256 units and ReLU activation functions, learning rate of 3e-4, batch size of 256, target update coefficient of 0.005, discount factor γ = 0.99 for all Mujoco experiments, etc.) |