Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning
Authors: Beyazit Yalcinkaya, Niklas Lauffer, Marcell Vazquez-Chanlatte, Sanjit Seshia
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical evaluation, we demonstrate that the proposed pre-training method enables zero-shot generalization to various c DFA task classes and accelerated policy specialization without the myopic suboptimality of hierarchical methods. |
| Researcher Affiliation | Collaboration | Beyazit Yalcinkaya University of California, Berkeley... Niklas Lauffer University of California, Berkeley... Marcell Vazquez-Chanlatte Nissan Advanced Technology Center... Sanjit A. Seshia University of California, Berkeley |
| Pseudocode | Yes | Algorithm 1 RAD c DFA Sampler |
| Open Source Code | Yes | For more information about the project, visit: https://rad-embeddings.github.io/. |
| Open Datasets | Yes | Letterworld Environment. Introduced in [4, 38], Letterworld is a 7x7 grid where the agent occupies one square at a time. |
| Dataset Splits | No | The paper describes the training and evaluation procedures in the context of reinforcement learning, which involves training an agent through interaction with an environment and evaluating performance on sampled tasks/episodes. It does not explicitly define traditional training, validation, and test splits for a static dataset. |
| Hardware Specification | Yes | Each seed in the experiments section was run as an individual Slurm job with access to 4 cores of an AMD EPYC 7763 running at 2.45GHz and access to at most 20gb of memory. |
| Software Dependencies | No | The paper mentions specific algorithms and network architectures like GATv2 [9], RGCN [34], and PPO [35], but does not provide version numbers for any specific software libraries or dependencies (e.g., PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Table 1 shows the hyperparameters used for every training run in the experiments section. |