Sequence to Better Sequence: Continuous Revision of Combinatorial Structures
Authors: Jonas Mueller, David Gifford, Tommi Jaakkola
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Table 1 compares various methods for proposing revisions. We applied all aforementioned approaches to produce revisions for a held-out set of 1000 test sentences. |
| Researcher Affiliation | Academia | 1MIT Computer Science & Artificial Intelligence Laboratory. Correspondence to: J. Mueller <jonasmueller@csail.mit.edu>. |
| Pseudocode | Yes | REVISE Algorithm Input: sequence x0 P X, constant P p0, |2 z|x0| 1 2 q Output: revised sequence x P X 1) Use E to compute q Epz | x0q 2) Define Cx0 z P Rd : q Epz | x0q 3) Find z argmax Fpzq (gradient ascent) 4) Return x Dpz q (beam search) |
| Open Source Code | No | The paper does not provide any statement about open-source code availability, specific repository links, or mention of code in supplementary materials. |
| Open Datasets | Yes | Next, we apply our model to 1M reviews from Beer Advocate (Mc Auley et al., 2012). For our final application, we assemble a dataset of 100K short sentences which are either from Shakespeare or a more contemporary source (details in S2.3). |
| Dataset Splits | No | all models were trained using n 10, 000 (sequence, outcome) pairs sampled from the generative grammar. (The paper mentions training and testing data, but does not provide specific percentages, counts, or citations for dataset splits or validation sets.) |
| Hardware Specification | No | The paper does not specify any hardware details like GPU/CPU models, processor types, or memory amounts used for the experiments. |
| Software Dependencies | No | All of our RNNs employ the Gated Recurrent Unit (GRU) of Cho et al. (2014), which contains a simple gating mechanism to effectively learn long-range dependencies across a sequence. VADER is a complex rule-based sentiment analysis tool which jointly estimates polarity and intensity of English text. (The paper mentions software tools/architectures but does not provide specific version numbers for software dependencies.) |
| Experiment Setup | No | All of our RNNs employ the Gated Recurrent Unit (GRU) of Cho et al. (2014)... Throughout, F is a simple feedforward network with 1 hidden layer and tanh activations... latent dimension d 128. Training is done via stochastic gradient descent applied to minimize the following objective over the examples in Dn... numerous mini-batch stochastic gradient updates (typically 10-30 epochs) are applied within every one of these steps... (The paper provides some architectural details and high-level training steps but lacks specific hyperparameter values like learning rate, batch size, or optimizer settings.) |