CQM: Curriculum Reinforcement Learning with a Quantized World Model

Authors: Seungjae Lee, Daesol Cho, Jonghae Park, H. Jin Kim

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The main goal of the experiments is to demonstrate the capability of the proposed method (CQM) to suggest a well-calibrated curriculum and lead to more sample-efficient learning, composing the goal space from the arbitrary observation space. To this end, we provide both qualitative and quantitative results in seven goal-reaching tasks including two visual control tasks, which receive the raw pixel observations from bird s-eye and ego-centric views, respectively.
Researcher Affiliation Academia Seungjae Lee, Daesol Cho, Jonghae Park, H. Jin Kim Seoul National University Automation and Systems Research Institute (ASRI) Artificial Intelligence Institute of Seoul National University (AIIS) {ysz0301, dscho1234, bdfire1234, hjinkim}@snu.ac.kr
Pseudocode Yes A.2 Algorithm Algorithm 1 Overview of CQM
Open Source Code No I'm sorry, but the paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets No I'm sorry, but the paper does not provide concrete access information for a publicly available or open dataset. It refers to simulated environments/tasks rather than traditional datasets with explicit access details.
Dataset Splits No I'm sorry, but the paper does not provide specific dataset split information needed to reproduce data partitioning, as it focuses on reinforcement learning within simulated environments rather than static datasets with explicit train/validation/test splits.
Hardware Specification Yes Our experiments have been performed using an NVIDIA RTX A5000 and AMD Ryzen 2950X, and the entire training process took approximately 0.5-2 days, depending on the tasks.
Software Dependencies No The paper mentions software like 'TD3 algorithm', 'SAC', 'Scikit-learn', and 'Python', but does not provide specific version numbers for these software components.
Experiment Setup Yes Table 2: Hyperparameters for CQM # of initial rollouts 20 HER [1] future step 150 batch size (state) 1024 batch size (IMG) 128 HER ratio critic Q 0.8 HER ratio graph Q 1.0 max graph node 300 graph update cycle M 5 critic hidden dim 256 discount factor γ 0.99 critic hidden depth 3 RL buffer mathcal B size 2500000 actor ϕ learning rate 0.0001 critic Q learning rate 0.001 interpolation factor (target Q) 0.995 target network update freq 10 actor update freq 2 # of VQ-VAE embeddings 128 VQ-VAE latent dimension 64 (-Viz: 32) RL optimizer adam