Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

I²HRL: Interactive Influence-based Hierarchical Reinforcement Learning

Authors: Rundong Wang, Runsheng Yu, Bo An, Zinovi Rabinovich

IJCAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate the effectiveness of the proposed solution in several tasks in Mu Jo Co domains by demonstrating that our approach can signiﬁcantly boost the learning performance and accelerate learning compared with stateof-the-art HRL methods.
Researcher Affiliation	Academia	School of Computer Science and Engineering, Nanyang Technological University, Singapore
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	Yes	We evaluate and analyze our methods in the benchmarking hierarchical tasks [Duan et al., 2016]. These environments were all simulated using the Mujo Co physics engine for modelbased control. The tasks are as follows: Ant Gather. Ant Maze. Ant Push.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Both levels of I2HRL utilize TD3. The low-level and high-level critic updates every single step and every 10 steps respectively. The low-level and high-level actor updates every 2 steps and every 20 steps respectively. We use Adam optimizer with learning rate of 3e 4 for actor and critic of both levels of policies. We set the high-level policy decision interval k and the length of trajectories for low-level policy represent c as 10. Discount γ = 0.99, replay buffer size is 200, 000 for both levels of policies. The method-speciﬁc hyper-parameters (β and βr) are ﬁne-tuned for each tasks.