Learning to Synthesize Programs as Interpretable and Generalizable Policies

Authors: Dweep Trivedi, Jesse Zhang, Shao-Hua Sun, Joseph J. Lim

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies.
Researcher Affiliation Collaboration Dweep Trivedi Jesse Zhang 1 Shao-Hua Sun1 Joseph J. Lim 1 1University of Southern California {dtrivedi, jessez, shaohuas, limjj}@usc.edu ... Work partially done as a visiting scholar at USC. AI Advisor at NAVER AI Lab.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Figure 1 describes the Domain Specific Language (DSL) grammar, but it is not an algorithm.
Open Source Code Yes Website at https://clvrai.com/leaps.
Open Datasets No The paper states it generated a dataset of 50,000 unique programs but does not provide access information (link, citation, repository) for this specific generated dataset.
Dataset Splits Yes This dataset is split into a training set with 35,000 programs a validation set with 7,500 programs, and a testing set with 7,500 programs.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions software components like PPO [67], SAC [68], and VAE, but it does not specify their version numbers.
Experiment Setup Yes Hyperparameters for the VAE model are: latent dimension of 64, learning rate 1e-4, batch size 64, encoder and decoder RNNs using 2 layers of GRUs with 256 hidden units. Training is performed using Adam optimizer for 200 epochs.