Learning to Compose Task-Specific Tree Structures

Authors: Jihun Choi, Kang Min Yoo, Sang-goo Lee

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed model on natural language inference and sentiment analysis, and show that our model outperforms or is at least comparable to previous models. From experiments on natural language inference and sentiment analysis tasks, we find that our proposed model outperforms or is at least comparable to previous sentence encoder models and converges significantly faster than them.
Researcher Affiliation Academia Jihun Choi, Kang Min Yoo, Sang-goo Lee Seoul National University, Seoul 08826, Korea {jhchoi, kangminyoo, sglee}@europa.snu.ac.kr
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks in the main text.
Open Source Code Yes The implementation is made publicly available.2 https://github.com/jihunchoi/unsupervised-treelstm
Open Datasets Yes In the Stanford Natural Language Inference (SNLI) dataset (Bowman et al. 2015), which we use for NLI experiments, a relationship is either contradiction, entailment, or neutral. The SNLI dataset is composed of about 550,000 sentences... we conducted experiments on Stanford Sentiment Treebank (SST) (Socher et al. 2013) dataset.
Dataset Splits Yes In the Stanford Natural Language Inference (SNLI) dataset (Bowman et al. 2015)... hyperparameters are tuned using the validation split.
Hardware Specification Yes All of our models converged within a few hours on a machine with NVIDIA Titan Xp GPU.
Software Dependencies No The paper mentions "cu DNN library" but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes For 100D experiments (where Dx = Dh = 100), we use a single-hidden layer MLP with 200 hidden units (i.e. Dc = 200). For 300D experiments (where Dx = Dh = 300), we set the number of hidden units of a single-hidden layer MLP to 1024 (Dc = 1024) and added batch normalization layers followed by dropout with probability 0.1. The size of mini-batches is set to 128 in all experiments, and hyperparameters are tuned using the validation split. The temperature parameter τ of Gumbel-Softmax is set to 1.0. For training models, Adam optimizer is used.