Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning

Authors: Beyazit Yalcinkaya, Niklas Lauffer, Marcell Vazquez-Chanlatte, Sanjit Seshia

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical evaluation, we demonstrate that the proposed pre-training method enables zero-shot generalization to various c DFA task classes and accelerated policy specialization without the myopic suboptimality of hierarchical methods.
Researcher Affiliation Collaboration Beyazit Yalcinkaya University of California, Berkeley... Niklas Lauffer University of California, Berkeley... Marcell Vazquez-Chanlatte Nissan Advanced Technology Center... Sanjit A. Seshia University of California, Berkeley
Pseudocode Yes Algorithm 1 RAD c DFA Sampler
Open Source Code Yes For more information about the project, visit: https://rad-embeddings.github.io/.
Open Datasets Yes Letterworld Environment. Introduced in [4, 38], Letterworld is a 7x7 grid where the agent occupies one square at a time.
Dataset Splits No The paper describes the training and evaluation procedures in the context of reinforcement learning, which involves training an agent through interaction with an environment and evaluating performance on sampled tasks/episodes. It does not explicitly define traditional training, validation, and test splits for a static dataset.
Hardware Specification Yes Each seed in the experiments section was run as an individual Slurm job with access to 4 cores of an AMD EPYC 7763 running at 2.45GHz and access to at most 20gb of memory.
Software Dependencies No The paper mentions specific algorithms and network architectures like GATv2 [9], RGCN [34], and PPO [35], but does not provide version numbers for any specific software libraries or dependencies (e.g., PyTorch, TensorFlow versions).
Experiment Setup Yes Table 1 shows the hyperparameters used for every training run in the experiments section.