Sequence to Better Sequence: Continuous Revision of Combinatorial Structures

Authors: Jonas Mueller, David Gifford, Tommi Jaakkola

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Table 1 compares various methods for proposing revisions. We applied all aforementioned approaches to produce revisions for a held-out set of 1000 test sentences.
Researcher Affiliation Academia 1MIT Computer Science & Artificial Intelligence Laboratory. Correspondence to: J. Mueller <jonasmueller@csail.mit.edu>.
Pseudocode Yes REVISE Algorithm Input: sequence x0 P X, constant P p0, |2 z|x0| 1 2 q Output: revised sequence x P X 1) Use E to compute q Epz | x0q 2) Define Cx0 z P Rd : q Epz | x0q 3) Find z argmax Fpzq (gradient ascent) 4) Return x Dpz q (beam search)
Open Source Code No The paper does not provide any statement about open-source code availability, specific repository links, or mention of code in supplementary materials.
Open Datasets Yes Next, we apply our model to 1M reviews from Beer Advocate (Mc Auley et al., 2012). For our final application, we assemble a dataset of 100K short sentences which are either from Shakespeare or a more contemporary source (details in S2.3).
Dataset Splits No all models were trained using n 10, 000 (sequence, outcome) pairs sampled from the generative grammar. (The paper mentions training and testing data, but does not provide specific percentages, counts, or citations for dataset splits or validation sets.)
Hardware Specification No The paper does not specify any hardware details like GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies No All of our RNNs employ the Gated Recurrent Unit (GRU) of Cho et al. (2014), which contains a simple gating mechanism to effectively learn long-range dependencies across a sequence. VADER is a complex rule-based sentiment analysis tool which jointly estimates polarity and intensity of English text. (The paper mentions software tools/architectures but does not provide specific version numbers for software dependencies.)
Experiment Setup No All of our RNNs employ the Gated Recurrent Unit (GRU) of Cho et al. (2014)... Throughout, F is a simple feedforward network with 1 hidden layer and tanh activations... latent dimension d 128. Training is done via stochastic gradient descent applied to minimize the following objective over the examples in Dn... numerous mini-batch stochastic gradient updates (typically 10-30 epochs) are applied within every one of these steps... (The paper provides some architectural details and high-level training steps but lacks specific hyperparameter values like learning rate, batch size, or optimizer settings.)