Latent Attention For If-Then Program Synthesis

Authors: Chang Liu, Xinyun Chen, Eui Chul Shin, Mingcheng Chen, Dawn Song

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our architecture reduces the error rate by 28.57% compared to prior art [3]. We use the same crawler from Quirk et al. [16] to crawl recipes from IFTTT.com. We evaluate two embedding methods as well as the effectiveness of different attention mechanisms. Figure 2 and Figure 3 present the results of prediction accuracy on channel and function respectively.
Researcher Affiliation Collaboration Xinyun Chen Shanghai Jiao Tong University Chang Liu Richard Shin Dawn Song UC Berkeley Mingcheng Chen UIUC Part of the work was done while visiting UC Berkeley. Work was done while visiting UC Berkeley. Mingcheng Chen is currently working at Google [X].
Pseudocode No The paper includes a network architecture diagram and mathematical equations but does not present any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about releasing open-source code for the described methodology, nor does it include a link to a code repository.
Open Datasets No We use the same crawler from Quirk et al. [16] to crawl recipes from IFTTT.com. Unfortunately, many recipes are no longer available. We crawled all remaining recipes, ultimately obtaining 68,083 recipes for the training set. [16] also provides a list of 5,171 recipes for validation, and 4,294 recipes for test. The paper describes the dataset used but does not provide concrete access information (e.g., a URL, DOI, or repository) for it to be publicly available, only referring to the crawling process and a list from prior work.
Dataset Splits Yes We crawled all remaining recipes, ultimately obtaining 68,083 recipes for the training set. [16] also provides a list of 5,171 recipes for validation, and 4,294 recipes for test. We found that only 4,220 validation recipes and 3,868 test recipes remain available.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x).
Experiment Setup Yes For architectures with no attention, they were trained using a learning rate of 0.01 initially, which is multiplied by 0.9 every 1,000 time steps. Gradients with L2 norm greater than 5 were scaled down to have norm 5. For architectures with either standard attention mechanism or Latent Attention, they were trained using a learning rate of 0.001 without decay, and gradients with L2 norm greater than 40 were scaled down to have norm 40. All models were trained using Adam [11]. All weights were initialized uniformly randomly in [ 0.1, 0.1]. Mini-batches were randomly shuffled during training. The mini-batch size is 32 and the embedding vector size d is 50.