Amortizing Pragmatic Program Synthesis with Rankings

Authors: Yewen Pu, Saujas Vaduguru, Priyan Vaithilingam, Elena Glassman, Daniel Fried

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two program synthesis domains using our ranking method resulted in orders of magnitudes of speed ups compared to the exact RSA synthesizer, while being more accurate than a non-pragmatic synthesizer when communicating with humans.
Researcher Affiliation Collaboration 1Autodesk AI Research 2Carnegie Mellon University 3Harvard SEAS
Pseudocode Yes Algorithm 1 Algorithm to obtain a dataset of simulated interactions between a speaker S and listener L. For each turn of each interaction, a ranking of programs is obtained. and Algorithm 2 Algorithm to infer a global order σ based on a dataset of simulated interactions, that terminates based on a validation criterion determined by the validation frequency V , patience t and convergence threshold T
Open Source Code Yes Please find all simulation, replay results at this repository https://github.com/evanthebouncy/pragmatic synthesis ranking/tree/main
Open Datasets Yes We conduct two replay studies by simulating virtual users giving examples one after another using human interaction data collected from prior works. We specifically refer to Pu et al. (2020) and Vaithilingam et al. (2023) as sources for this data.
Dataset Splits No The paper mentions using a validation set: 'We use a validation set generated similarly to D (on a disjoint set of programs) to perform validation, choosing the model that results in the highest synthesis accuracy on this validation dataset with synthetically produced examples (from the S1 speaker model).' However, it does not provide specific split percentages, sample counts, or explicit details about the splitting methodology.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency versions (e.g., 'Python 3.8, PyTorch 1.9') for replicating the experiments.
Experiment Setup Yes The input is then passed through 3 hidden layers of size 128, each of which has as Re LU activation, and then mapped to a scalar output with a linear layer. The model is trained on a dataset of rankings of the form D = (w, u, σu). For each program w, we sample a pair of programs from the inferred ranking σu and use this pair to compute the loss function for this sample. We train the model for a maximum of 20 epochs, where one epoch of training corresponds to presenting the model with every element in D once. We train with a batch size of 32 using the Adam optimizer.