Hierarchical Imitation Learning with Vector Quantized Models
Authors: Kalle Kujanpää, Joni Pajarinen, Alexander Ilin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, the algorithm excels at solving complex, longhorizon decision-making problems outperforming state-of-the-art. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Aalto University, Finland 2Finnish Center for Artificial Intelligence FCAI 3Department of Electrical Engineering and Automation, Aalto University, Finland. |
| Pseudocode | Yes | Algorithm 1 Segmenting Trajectories for Hierarchical IL; The pseudocode for our VQVAE training is given in Algorithm 2 in the Appendix I.; Listing 1 contains the pseudo-code describing our high-level search with a priority queue. |
| Open Source Code | No | The paper cites open-sourced implementations of baseline algorithms (e.g., d3rlpy for CQL, authors' implementations for others), but does not provide a specific link or explicit statement about releasing the source code for the methodology described in this paper. |
| Open Datasets | No | The paper describes the collection of training data for each environment ('In Sokoban, we collect a training set of 10340 trajectories using gym-sokoban (Schrader, 2018).', 'The training set consists of 5100 trajectories in STP and 22100 in Box-World...'), but it does not provide concrete access information (e.g., a link, DOI, or repository) for these collected datasets. |
| Dataset Splits | No | The paper mentions 'training set' and 'test problems' but does not specify clear training/validation/test dataset splits, such as percentages or counts for each subset, or reference a predefined split methodology. |
| Hardware Specification | Yes | We trained our models on an HPC cluster using one NVIDIA GPU and multiple CPU workers per run. Most runs were performed on V100 GPUs with 32 GB of GDDR SDRAM. For some of the runs, the GPU was an A100, a K100, or a P80. We used 6 CPU workers per GPU with 10 GB of RAM per worker and each worker running on one core. By reducing the number of workers, it is possible to train and evaluate the agent on a workstation with Intel i7-8086K, 16 GB of RAM, and an NVIDIA Ge Force GTX 1080 Ti GPU with 10 GB of video memory. |
| Software Dependencies | No | The paper mentions software like 'Py Torch (Paszke et al., 2019)' and 'Adam optimizer (Kingma & Ba, 2015)' and includes 'Py Torch pseudocode', but it does not specify explicit version numbers for these software dependencies required for replication. |
| Experiment Setup | Yes | Table 11: General hyperparameters of our method. Parameter Value Learning rate for dynamics 2 10 4 Learning rate for π, d, V 1 10 3 Learning rate for VQVAE 2 10 4 Discount rate for REINFORCE 0.99; Table 12: Environment-specific hyperparameters of our method. Parameter Explanation Sokoban STP Box-World TSP α Subgoal penalty 0.1 0.1 0.1 0.05 β Beta for VQVAE 0.1 0.1 0.1 0 c Exploration constant for MCTS 0.1 D Codebook dimensionality 128 128 128 64 H Subgoal horizon 10 10 10 50 K VQVAE codebook size 64 64 64 32 k Segment length w/o REINFORCE 5 5 5 4 (N, D) DRC size (3, 3) |