Deriving Neural Architectures from Sequence and Graph Kernels
Authors: Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we apply the proposed sequence and graph modules to various tasks and empirically evaluate their performance against other neural network models. These tasks include language modeling, sentiment classification and molecule regression. |
| Researcher Affiliation | Academia | 1MIT Computer Science & Artificial Intelligence Laboratory. |
| Pseudocode | No | The paper describes the neural operations using mathematical equations (e.g., Eq. 4, 6, 8, 9) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code available at https://github.com/taolei87/icml17 knn |
| Open Datasets | Yes | We use the Penn Tree Bank (PTB) corpus as the benchmark." and "We use the Stanford Sentiment Treebank benchmark (Socher et al., 2013)." and "We further evaluate our graph NN models on the Harvard Clean Energy Project benchmark, which has been used in Dai et al. (2016); Duvenaud et al. (2015) as their evaluation dataset. |
| Dataset Splits | Yes | We use the standard train/development/test split of this dataset with vocabulary of size 10,000. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments (e.g., specific GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions optimizers like SGD and Adam, and techniques like dropout, but does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Following standard practice, we use SGD with an initial learning rate of 1.0 and decrease the learning rate by a constant factor after a certain epoch. We back-propagate the gradient with an unroll size of 35 and use dropout (Hinton et al., 2012) as the regularization." and "Our best model is a 3-layer network with n = 2 and hidden dimension d = 200. ... The model is optimized with Adam (Kingma & Ba, 2015), and dropout probability of 0.35. |