Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CQM: Curriculum Reinforcement Learning with a Quantized World Model

Authors: Seungjae Lee, Daesol Cho, Jonghae Park, H. Jin Kim

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The main goal of the experiments is to demonstrate the capability of the proposed method (CQM) to suggest a well-calibrated curriculum and lead to more sample-efficient learning, composing the goal space from the arbitrary observation space. To this end, we provide both qualitative and quantitative results in seven goal-reaching tasks including two visual control tasks, which receive the raw pixel observations from bird s-eye and ego-centric views, respectively.
Researcher Affiliation	Academia	Seungjae Lee, Daesol Cho, Jonghae Park, H. Jin Kim Seoul National University Automation and Systems Research Institute (ASRI) Artificial Intelligence Institute of Seoul National University (AIIS) EMAIL
Pseudocode	Yes	A.2 Algorithm Algorithm 1 Overview of CQM
Open Source Code	No	I'm sorry, but the paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	No	I'm sorry, but the paper does not provide concrete access information for a publicly available or open dataset. It refers to simulated environments/tasks rather than traditional datasets with explicit access details.
Dataset Splits	No	I'm sorry, but the paper does not provide specific dataset split information needed to reproduce data partitioning, as it focuses on reinforcement learning within simulated environments rather than static datasets with explicit train/validation/test splits.
Hardware Specification	Yes	Our experiments have been performed using an NVIDIA RTX A5000 and AMD Ryzen 2950X, and the entire training process took approximately 0.5-2 days, depending on the tasks.
Software Dependencies	No	The paper mentions software like 'TD3 algorithm', 'SAC', 'Scikit-learn', and 'Python', but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Table 2: Hyperparameters for CQM # of initial rollouts 20 HER [1] future step 150 batch size (state) 1024 batch size (IMG) 128 HER ratio critic Q 0.8 HER ratio graph Q 1.0 max graph node 300 graph update cycle M 5 critic hidden dim 256 discount factor γ 0.99 critic hidden depth 3 RL buffer mathcal B size 2500000 actor ϕ learning rate 0.0001 critic Q learning rate 0.001 interpolation factor (target Q) 0.995 target network update freq 10 actor update freq 2 # of VQ-VAE embeddings 128 VQ-VAE latent dimension 64 (-Viz: 32) RL optimizer adam