Deriving Neural Architectures from Sequence and Graph Kernels

Authors: Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we apply the proposed sequence and graph modules to various tasks and empirically evaluate their performance against other neural network models. These tasks include language modeling, sentiment classification and molecule regression.
Researcher Affiliation Academia 1MIT Computer Science & Artificial Intelligence Laboratory.
Pseudocode No The paper describes the neural operations using mathematical equations (e.g., Eq. 4, 6, 8, 9) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1Code available at https://github.com/taolei87/icml17 knn
Open Datasets Yes We use the Penn Tree Bank (PTB) corpus as the benchmark." and "We use the Stanford Sentiment Treebank benchmark (Socher et al., 2013)." and "We further evaluate our graph NN models on the Harvard Clean Energy Project benchmark, which has been used in Dai et al. (2016); Duvenaud et al. (2015) as their evaluation dataset.
Dataset Splits Yes We use the standard train/development/test split of this dataset with vocabulary of size 10,000.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments (e.g., specific GPU/CPU models, memory, or cloud instance types).
Software Dependencies No The paper mentions optimizers like SGD and Adam, and techniques like dropout, but does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Following standard practice, we use SGD with an initial learning rate of 1.0 and decrease the learning rate by a constant factor after a certain epoch. We back-propagate the gradient with an unroll size of 35 and use dropout (Hinton et al., 2012) as the regularization." and "Our best model is a 3-layer network with n = 2 and hidden dimension d = 200. ... The model is optimized with Adam (Kingma & Ba, 2015), and dropout probability of 0.35.