Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adaptive Order Q-learning

Authors: Tao Tan, Hong Xie, Defu Lian

IJCAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We start with tabular MDP experiments, to reveal fundamental insights into why Order Q-learning and Ada Order Qlearning can achieve superior performance. We then evaluate the impact of Order DQN and Ada Order DQN in deep reinforcement learning settings.
Researcher Affiliation	Academia	Tao Tan1 , Hong Xie2 , Defu Lian2 1College of Computer Science, Chongqing University 2University of Science and Technology of China
Pseudocode	Yes	Algorithm 1 Order Q-learning, Algorithm 2 Order DQN, Algorithm 3 Ada Order Q-learning, Algorithm 4 Ada Order DQN
Open Source Code	Yes	The code of all experiments can be found in link1. https://1drv.ms/u/s!Atddp1ia Dm L2ghdc Hy YXNO785mo D
Open Datasets	Yes	We introduce three tabular MDP environments: (1) Multi-armed bandit is adapted from [Mannor et al., 2007], which considers the single-state with ten action and the reward of each action obeys the distribution N(0, 1); (2) A simple MDP environment is depicted in Figure 1, where µ1 = 0.1, σ1 = 1.0, µ2 = 0.1, σ2 = 1.0; (3) Gridworld [Zhang et al., 2017] has four actions, i.e., up, down, left, and right for each state. ... To evaluate the impact of Order DQN and Ada Order DQN, we choose three common deep reinforcement learning games from Py Game Learning Environment [Urtans and Nikitenko, 2018] and Min Atar [Young and Tian, 2019]: Pixelcopter, Breakout, and Asterix.
Dataset Splits	No	For the Pixelcopter environment, we set \|D\| = 10, 000, V = 200, and α = 0.001. ε decreases linearly from 1.0 to 0.01 in 1, 000 steps, and fixes to 0.01 after 1, 000 steps. For the Breakout and Asterix environments, we set \|D\| = 100, 000, V = 1, 000, and α = 0.01. ε decreases linearly from 1.0 to 0.1 in 100, 000 steps, and fixes to 0.1 after 100, 000 steps. The paper does not explicitly specify a validation dataset split, only training parameters.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for running experiments.
Software Dependencies	No	The paper mentions using 'Py Game Learning Environment' and 'Min Atar' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	Following [Hasselt, 2010; Zhu and Rigotti, 2021; Pentaliotis and Wiering, 2021], we set γ = 0.95, α = 1 n(s,a)0.8 , and ε = 1 n(s)0.5 for the Multiarmed bandit and Gridworld environments; and set γ = 1.0, α = 0.1, and ε = 0.1 for the MDP environment. ... For the Pixelcopter environment, we set \|D\| = 10, 000, V = 200, and α = 0.001. ε decreases linearly from 1.0 to 0.01 in 1, 000 steps, and fixes to 0.01 after 1, 000 steps. For the Breakout and Asterix environments, we set \|D\| = 100, 000, V = 1, 000, and α = 0.01. ε decreases linearly from 1.0 to 0.1 in 100, 000 steps, and fixes to 0.1 after 100, 000 steps.