Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Policy Gradient With Serial Markov Chain Reasoning

Authors: Edoardo Cetin, Oya Celiktutan

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We obtain state-of-the-art results for popular benchmarks from the Open AI Gym Mujoco suite [29] and the Deep Mind Control suite from pixels [30]. We evaluate the serial Markov chain reasoning framework by comparing its performance with current state-of-the-art baselines based on traditional RL. We consider 6 challenging Mujoco tasks from Gym [29, 56] and 12 tasks pixel-based tasks from the Deep Mind Control Suite (DMC) [30].
Researcher Affiliation	Academia	Edoardo Cetin Department of Engineering King s College London EMAIL Oya Celiktutan Department of Engineering King s College London EMAIL
Pseudocode	Yes	Algorithm 1 Agent Acting input: s, current state a0 ˆA N 0 Rp +1 while Rp > 1.1 do ( \|a N) N N + 1 Update Rp with a1:N . Eq.16 ˆ N ˆ N + (1 )N . 2 [0, 1) ˆA ˆA [ a1:N output: a a1:N Algorithm 2 Agent Learning input: D, data buffer (s, a, s0, r) D a0 b ( \|a, s0) for n 0, d ˆ Ne do Qs φ(s0, an) . Eq. 8 n+1 N(0, 1), an+1 = f b(an, s, n+1) r Qs n) . Thm. 3.2 arg min J( ) . Eq. 6 a0 a1:d ˆ Ne arg minφ J(φ) . Eq. 7
Open Source Code	Yes	We provide our implementation for transparency and to facilitate future extensions at sites.google.com/view/serial-mcr/.
Open Datasets	Yes	We consider 6 challenging Mujoco tasks from Gym [29, 56] and 12 tasks pixel-based tasks from the Deep Mind Control Suite (DMC) [30].
Dataset Splits	No	The paper does not explicitly provide specific train/validation/test dataset splits with percentages or absolute counts for reproducibility. It mentions evaluation rollouts but not data partitioning for model training.
Hardware Specification	No	The paper states "See Section D of the Appendix" for details on compute and resources used. However, Appendix D is not provided in the given text, so specific hardware details are not available.
Software Dependencies	No	The paper mentions general software components like 'Max Ent RL' and 'Rliable' but does not provide specific version numbers for any software dependencies within the provided text.
Experiment Setup	No	The paper refers to 'App. C or the code for full details' regarding design choices and training procedures, and 'App. E' for further ablation studies. However, these appendices are not included in the provided text, so specific experimental setup details are not available.