Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Goal-Conditioned Q-learning as Knowledge Distillation

Authors: Alexander Levine, Soheil Feizi

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
Researcher Affiliation	Academia	University of Maryland, College Park, Maryland, USA EMAIL
Pseudocode	No	The paper describes the proposed methods and loss functions mathematically and in prose, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and appendix are available at https://github.com/alevine0/Reen GAGE.
Open Datasets	Yes	We tested our method on Hand Reach, the environment from the Open AI Gym Robotics suite (Plappert et al. 2018) with the highest-dimensional goal space (d = 15).
Dataset Splits	No	The paper performs a grid search over hyperparameters to find the 'best hyperparameter settings' for evaluation, but it does not specify explicit training/validation/test dataset splits with percentages or counts for a static dataset, which is typical for supervised learning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions using established algorithms and environments like DDPG, HER, SAC, and Open AI Gym, but it does not specify software versions (e.g., Python 3.x, PyTorch 1.x, or specific library versions) needed for reproduction.
Experiment Setup	Yes	For the baseline and each value of α, we performed a grid search over learning rates {0.00025, 0.0005, 0.001, 0.0015} and batch sizes {128, 256, 512}; the curves shown represent the best hyperparameter settings for each α, defined as maximizing the area under the curves. See appendix for results for all hyperparameter settings. Other hyperparameters were kept fixed and are listed in the appendix.