Learning to Compose Task-Specific Tree Structures
Authors: Jihun Choi, Kang Min Yoo, Sang-goo Lee
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed model on natural language inference and sentiment analysis, and show that our model outperforms or is at least comparable to previous models. From experiments on natural language inference and sentiment analysis tasks, we find that our proposed model outperforms or is at least comparable to previous sentence encoder models and converges significantly faster than them. |
| Researcher Affiliation | Academia | Jihun Choi, Kang Min Yoo, Sang-goo Lee Seoul National University, Seoul 08826, Korea {jhchoi, kangminyoo, sglee}@europa.snu.ac.kr |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks in the main text. |
| Open Source Code | Yes | The implementation is made publicly available.2 https://github.com/jihunchoi/unsupervised-treelstm |
| Open Datasets | Yes | In the Stanford Natural Language Inference (SNLI) dataset (Bowman et al. 2015), which we use for NLI experiments, a relationship is either contradiction, entailment, or neutral. The SNLI dataset is composed of about 550,000 sentences... we conducted experiments on Stanford Sentiment Treebank (SST) (Socher et al. 2013) dataset. |
| Dataset Splits | Yes | In the Stanford Natural Language Inference (SNLI) dataset (Bowman et al. 2015)... hyperparameters are tuned using the validation split. |
| Hardware Specification | Yes | All of our models converged within a few hours on a machine with NVIDIA Titan Xp GPU. |
| Software Dependencies | No | The paper mentions "cu DNN library" but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | For 100D experiments (where Dx = Dh = 100), we use a single-hidden layer MLP with 200 hidden units (i.e. Dc = 200). For 300D experiments (where Dx = Dh = 300), we set the number of hidden units of a single-hidden layer MLP to 1024 (Dc = 1024) and added batch normalization layers followed by dropout with probability 0.1. The size of mini-batches is set to 128 in all experiments, and hyperparameters are tuned using the validation split. The temperature parameter τ of Gumbel-Softmax is set to 1.0. For training models, Adam optimizer is used. |