CQM: Curriculum Reinforcement Learning with a Quantized World Model
Authors: Seungjae Lee, Daesol Cho, Jonghae Park, H. Jin Kim
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The main goal of the experiments is to demonstrate the capability of the proposed method (CQM) to suggest a well-calibrated curriculum and lead to more sample-efficient learning, composing the goal space from the arbitrary observation space. To this end, we provide both qualitative and quantitative results in seven goal-reaching tasks including two visual control tasks, which receive the raw pixel observations from bird s-eye and ego-centric views, respectively. |
| Researcher Affiliation | Academia | Seungjae Lee, Daesol Cho, Jonghae Park, H. Jin Kim Seoul National University Automation and Systems Research Institute (ASRI) Artificial Intelligence Institute of Seoul National University (AIIS) {ysz0301, dscho1234, bdfire1234, hjinkim}@snu.ac.kr |
| Pseudocode | Yes | A.2 Algorithm Algorithm 1 Overview of CQM |
| Open Source Code | No | I'm sorry, but the paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | No | I'm sorry, but the paper does not provide concrete access information for a publicly available or open dataset. It refers to simulated environments/tasks rather than traditional datasets with explicit access details. |
| Dataset Splits | No | I'm sorry, but the paper does not provide specific dataset split information needed to reproduce data partitioning, as it focuses on reinforcement learning within simulated environments rather than static datasets with explicit train/validation/test splits. |
| Hardware Specification | Yes | Our experiments have been performed using an NVIDIA RTX A5000 and AMD Ryzen 2950X, and the entire training process took approximately 0.5-2 days, depending on the tasks. |
| Software Dependencies | No | The paper mentions software like 'TD3 algorithm', 'SAC', 'Scikit-learn', and 'Python', but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Table 2: Hyperparameters for CQM # of initial rollouts 20 HER [1] future step 150 batch size (state) 1024 batch size (IMG) 128 HER ratio critic Q 0.8 HER ratio graph Q 1.0 max graph node 300 graph update cycle M 5 critic hidden dim 256 discount factor γ 0.99 critic hidden depth 3 RL buffer mathcal B size 2500000 actor ϕ learning rate 0.0001 critic Q learning rate 0.001 interpolation factor (target Q) 0.995 target network update freq 10 actor update freq 2 # of VQ-VAE embeddings 128 VQ-VAE latent dimension 64 (-Viz: 32) RL optimizer adam |