Hierarchical Imitation Learning with Vector Quantized Models

Authors: Kalle Kujanpää, Joni Pajarinen, Alexander Ilin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, the algorithm excels at solving complex, longhorizon decision-making problems outperforming state-of-the-art.
Researcher Affiliation Academia 1Department of Computer Science, Aalto University, Finland 2Finnish Center for Artificial Intelligence FCAI 3Department of Electrical Engineering and Automation, Aalto University, Finland.
Pseudocode Yes Algorithm 1 Segmenting Trajectories for Hierarchical IL; The pseudocode for our VQVAE training is given in Algorithm 2 in the Appendix I.; Listing 1 contains the pseudo-code describing our high-level search with a priority queue.
Open Source Code No The paper cites open-sourced implementations of baseline algorithms (e.g., d3rlpy for CQL, authors' implementations for others), but does not provide a specific link or explicit statement about releasing the source code for the methodology described in this paper.
Open Datasets No The paper describes the collection of training data for each environment ('In Sokoban, we collect a training set of 10340 trajectories using gym-sokoban (Schrader, 2018).', 'The training set consists of 5100 trajectories in STP and 22100 in Box-World...'), but it does not provide concrete access information (e.g., a link, DOI, or repository) for these collected datasets.
Dataset Splits No The paper mentions 'training set' and 'test problems' but does not specify clear training/validation/test dataset splits, such as percentages or counts for each subset, or reference a predefined split methodology.
Hardware Specification Yes We trained our models on an HPC cluster using one NVIDIA GPU and multiple CPU workers per run. Most runs were performed on V100 GPUs with 32 GB of GDDR SDRAM. For some of the runs, the GPU was an A100, a K100, or a P80. We used 6 CPU workers per GPU with 10 GB of RAM per worker and each worker running on one core. By reducing the number of workers, it is possible to train and evaluate the agent on a workstation with Intel i7-8086K, 16 GB of RAM, and an NVIDIA Ge Force GTX 1080 Ti GPU with 10 GB of video memory.
Software Dependencies No The paper mentions software like 'Py Torch (Paszke et al., 2019)' and 'Adam optimizer (Kingma & Ba, 2015)' and includes 'Py Torch pseudocode', but it does not specify explicit version numbers for these software dependencies required for replication.
Experiment Setup Yes Table 11: General hyperparameters of our method. Parameter Value Learning rate for dynamics 2 10 4 Learning rate for π, d, V 1 10 3 Learning rate for VQVAE 2 10 4 Discount rate for REINFORCE 0.99; Table 12: Environment-specific hyperparameters of our method. Parameter Explanation Sokoban STP Box-World TSP α Subgoal penalty 0.1 0.1 0.1 0.05 β Beta for VQVAE 0.1 0.1 0.1 0 c Exploration constant for MCTS 0.1 D Codebook dimensionality 128 128 128 64 H Subgoal horizon 10 10 10 50 K VQVAE codebook size 64 64 64 32 k Segment length w/o REINFORCE 5 5 5 4 (N, D) DRC size (3, 3)