Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Authors: John Co-Reyes, YuXuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, Sergey Levine

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experimental evaluation, we aim to address the following questions: (1) Can we learn good exploratory behavior in the absence of task reward, using Se CTAR with our proposed exploration method? (2) Can we use the learned latent space with planning and exploration in the loop to solve hierarchical and sparse reward tasks? (3) Does the state decoder model make meaningful predictions about the outcomes of the high-level actions? We evaluate our method on four different domains: 2D navigation, object manipulation, wheeled locomotion, swimmer navigation which are shown in Figure 3. and We compare our full Algorithm 2 against several baselines methods for exploration, hierarchy, and model-based control.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Google Brain.
Pseudocode Yes Algorithm 1 Model predictive control in latent space
Open Source Code Yes All our results, videos, and experimental details can be found at https://sites.google.com/view/sectar
Open Datasets No The paper describes custom simulated environments (2D navigation, object manipulation, wheeled locomotion, swimmer navigation) but does not provide any access information (link, DOI, formal citation) for any publicly available or open dataset. It implicitly uses data generated within these environments during training.
Dataset Splits No The paper does not provide specific details about train/validation/test dataset splits. It describes experimental evaluations on custom simulated tasks without explicit data partitioning information.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup No The paper mentions using PPO and describes the general training procedure but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed optimizer settings in the main text. It states 'Details of the experimental evaluation can be found in the appendix', but the appendix is not available.