Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning
Authors: Beyazit Yalcinkaya, Niklas Lauffer, Marcell Vazquez-Chanlatte, Sanjit Seshia
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical evaluation, we demonstrate that the proposed pre-training method enables zero-shot generalization to various c DFA task classes and accelerated policy specialization without the myopic suboptimality of hierarchical methods. |
| Researcher Affiliation | Collaboration | Beyazit Yalcinkaya University of California, Berkeley... Niklas Lauffer University of California, Berkeley... Marcell Vazquez-Chanlatte Nissan Advanced Technology Center... Sanjit A. Seshia University of California, Berkeley |
| Pseudocode | Yes | Algorithm 1 RAD c DFA Sampler |
| Open Source Code | Yes | For more information about the project, visit: https://rad-embeddings.github.io/. |
| Open Datasets | Yes | Letterworld Environment. Introduced in [4, 38], Letterworld is a 7x7 grid where the agent occupies one square at a time. |
| Dataset Splits | No | The paper describes the training and evaluation procedures in the context of reinforcement learning, which involves training an agent through interaction with an environment and evaluating performance on sampled tasks/episodes. It does not explicitly define traditional training, validation, and test splits for a static dataset. |
| Hardware Specification | Yes | Each seed in the experiments section was run as an individual Slurm job with access to 4 cores of an AMD EPYC 7763 running at 2.45GHz and access to at most 20gb of memory. |
| Software Dependencies | No | The paper mentions specific algorithms and network architectures like GATv2 [9], RGCN [34], and PPO [35], but does not provide version numbers for any specific software libraries or dependencies (e.g., PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Table 1 shows the hyperparameters used for every training run in the experiments section. |