Learning to Synthesize Programs as Interpretable and Generalizable Policies
Authors: Dweep Trivedi, Jesse Zhang, Shao-Hua Sun, Joseph J. Lim
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. |
| Researcher Affiliation | Collaboration | Dweep Trivedi Jesse Zhang 1 Shao-Hua Sun1 Joseph J. Lim 1 1University of Southern California {dtrivedi, jessez, shaohuas, limjj}@usc.edu ... Work partially done as a visiting scholar at USC. AI Advisor at NAVER AI Lab. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Figure 1 describes the Domain Specific Language (DSL) grammar, but it is not an algorithm. |
| Open Source Code | Yes | Website at https://clvrai.com/leaps. |
| Open Datasets | No | The paper states it generated a dataset of 50,000 unique programs but does not provide access information (link, citation, repository) for this specific generated dataset. |
| Dataset Splits | Yes | This dataset is split into a training set with 35,000 programs a validation set with 7,500 programs, and a testing set with 7,500 programs. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like PPO [67], SAC [68], and VAE, but it does not specify their version numbers. |
| Experiment Setup | Yes | Hyperparameters for the VAE model are: latent dimension of 64, learning rate 1e-4, batch size 64, encoder and decoder RNNs using 2 layers of GRUs with 256 hidden units. Training is performed using Adam optimizer for 200 epochs. |