Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Goal-Directed Planning via Hindsight Experience Replay

Authors: Lorenzo Moro, Amarildo Likmeta, Enrico Prati, Marcello Restelli

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of the proposed approach through an extensive empirical evaluation in several simulated domains, including a novel application to a quantum compiling domain.
Researcher Affiliation	Academia	1DEIB, Politecnico di Milano, Milan, Italy 2CNR-IFN, Milan, Italy 3FABIT, Universita di Bologna, Bologna, Italy
Pseudocode	Yes	Algorithm 1: Alpha Zero HER Initialize memory buffer B Initialize policy πθ and value network vθ for epoch = 1, , N do for episode = 1, , M do experiences {} st µ // Sample initial state while not done do pt, at MCTS(st, πθ, vθ) st+1, rt, done apply Action(at) experiences experiences S (st, pt, rt) st st+1 end Store every experience (st, pt, zt) in B, where zt = PT i=t γi tri for t in episode experiences do // Generate new experiences G Sample k goals from future visited states sj where j > t for g in G do rg t r(st, at, g) end Store every (st, pt, zg t ) in B, where zg t = PT i=t γi trg i end update πθ, vθ according to Equation 4 end end
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	No	The paper describes custom simulated environments (Bit Flip, 2D Navigation, 2D Maze, Quantum Compiler) where data is generated through interaction. It does not refer to or provide access to a specific publicly available dataset used for training.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing, as it operates within dynamically interacting environments rather than fixed datasets.
Hardware Specification	No	The paper states 'We ran each experiment in a single multi-core machine, with no GPUs.' This is not specific enough to identify exact CPU models, processor types, or memory details.
Software Dependencies	No	The paper mentions 'stable-baselines' and 'hyperopt' but does not specify their version numbers. No other software dependencies are mentioned with version numbers.
Experiment Setup	Yes	In this section, we provide the hyper-parameters employed in the experiments presented in this work. Table 1 and Table 2 provide a list of hyperparameters employed for both Alpha Zero and Alpha Zero HER, without being optimized. Table 1: Hyperparameter Value Optimizer Adam cuct 2.0 Discount factor 0.999 Episodes per epoch 50. Table 2: Hyperparameter Environment Value Learning rate Bit Flip 0.0005 2D Navigation 0.001 2D Maze 0.0005 Quantum Compiling 0.00005 Batch size Bit Flip 256 2D Navigation 512 2D Maze 512 Quantum Compiling 512 Search Iterations Bit Flip 20 2D Navigation 70 2D Maze 120 Quantum Compiling 20