reproducibilityindex.ai

Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

Authors: Michael Janner, Igor Mordatch, Sergey Levine

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation is designed to study the viability of γ-models as a replacement of conventional single-step models for long-horizon state prediction and model-based control. Figure 5 shows learning curves for all methods. We find that γ-MVE converges faster than prior algorithms, twice as quickly as SAC, while retaining their asymptotic performance.
Researcher Affiliation	Collaboration	Michael Janner1 Igor Mordatch2 Sergey Levine12 1UC Berkeley 2Google Brain {janner, svlevine}@eecs.berkeley.edu imordatch@google.com
Pseudocode	Yes	Algorithm 1 γ-model training without density evaluation. Algorithm 2 γ-model training with density evaluation.
Open Source Code	No	The paper does not provide an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	We investigate γ-model predictions as a function of discount in continuous-action versions of two benchmark environments suitable for visualization: acrobot (Sutton, 1996) and pendulum. The training data come from a mixture distribution over all intermediate policies of 200 epochs of optimization with soft-actor critic (SAC; Haarnoja et al. 2018).
Dataset Splits	No	The paper mentions 'training data' but does not specify explicit training, validation, or test dataset splits with percentages, counts, or a detailed splitting methodology.
Hardware Specification	No	The acknowledgements section mentions 'computational resource donations from Amazon', but no specific hardware details such as GPU/CPU models, processors, or cloud instance specifications are provided for running the experiments.
Software Dependencies	Yes	All models were implemented using PyTorch 1.5.0 and CUDA 10.2.
Experiment Setup	Yes	Further implementation details, including all hyperparameter settings and network architectures, are included in Appendix C. (Appendix C contains: MLPs with 2 hidden layers of 256 units and ReLU activation functions, learning rate of 3e-4, batch size of 256, target update coefficient of 0.005, discount factor γ = 0.99 for all Mujoco experiments, etc.)