PcLast: Discovering Plannable Continuous Latent States

Authors: Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan P Molu, Miroslav Dudı́k, John Langford, Alex Lamb

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based settings show significant improvements in sampling efficiency. Further, in rewardfree settings this approach yields layered state abstractions that enable computationally efficient hierarchical planning for reaching ad hoc goals with zero additional samples. ... In this section, we address the following questions via experimentation over environments of different complexities: (1) Does the PCLAST representation lead to performance gains in reward-based and reward-free goal-conditioned tasks? (2) Does increasing abstraction levels lead to more computationally efficient and better plans? (3) What is the effect of PCLAST map on abstraction? ... Table 1. Impact of different representations on policy learning and planning.
Researcher Affiliation Collaboration 1Microsoft Research 2Work done as Intern at Microsoft, NYC 3Mila Quebec AI Institute 4ETS Montreal 5New York University 6Mc Gill University 7Meta.
Pseudocode Yes Algorithm 1 n-Level Planner; Algorithm 2 High-level planner; Algorithm 3 Cross-Entroy method.
Open Source Code Yes Code for reproducing our experimental results can be found at https://github.com/shivakanthsujit/pclast
Open Datasets Yes Exogenous Noise Mujoco. We adopted control-tasks Cheetah-Run and Walker-walk from visual-d4rl (Lu et al., 2022) benchmark which provides offline transition datasets of various qualities.
Dataset Splits No The paper mentions collecting offline datasets (e.g., 500K transitions) and using them for training, but does not specify explicit train/validation/test splits with percentages, sample counts, or references to predefined splits for these datasets within the main text.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models, memory, or cloud computing specifications.
Software Dependencies No The paper mentions the use of the Adam optimizer, but does not provide specific version numbers for general software dependencies, libraries, or simulation environments used for the experiments.
Experiment Setup Yes These networks are trained using Adam (Kingma & Ba, 2014) optimizer with learning rate of 1e 3 over batches of size 512. Unless specified otherwise, we sample transitions with Kmax = 10. In the following, we discuss each of the networks. ... We use the same encoder output(ˆs) dimension of 256 for all our experiments. ... We divide the action-space of considered environments into 20 bins. During backprop, mean-square loss and categorical loss are scaled by 10 and 0.01 respectively. ... For training, we sample positive examples using dm = 2, and negative samples are drawn randomly.