Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Synthesize Programs as Interpretable and Generalizable Policies

Authors: Dweep Trivedi, Jesse Zhang, Shao-Hua Sun, Joseph J. Lim

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies.
Researcher Affiliation Collaboration Dweep Trivedi Jesse Zhang 1 Shao-Hua Sun1 Joseph J. Lim 1 1University of Southern California EMAIL ... Work partially done as a visiting scholar at USC. AI Advisor at NAVER AI Lab.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Figure 1 describes the Domain Specific Language (DSL) grammar, but it is not an algorithm.
Open Source Code Yes Website at https://clvrai.com/leaps.
Open Datasets No The paper states it generated a dataset of 50,000 unique programs but does not provide access information (link, citation, repository) for this specific generated dataset.
Dataset Splits Yes This dataset is split into a training set with 35,000 programs a validation set with 7,500 programs, and a testing set with 7,500 programs.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions software components like PPO [67], SAC [68], and VAE, but it does not specify their version numbers.
Experiment Setup Yes Hyperparameters for the VAE model are: latent dimension of 64, learning rate 1e-4, batch size 64, encoder and decoder RNNs using 2 layers of GRUs with 256 hidden units. Training is performed using Adam optimizer for 200 epochs.