Unsupervised Learning of Discourse Structures using a Tree Autoencoder
Authors: Patrick Huber, Giuseppe Carenini13107-13115
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To fully evaluate the performance of our T-AE method, we conduct experiments on three distinct tasks, focusing on the two learning goals of our model: (1) Evaluating if the model is able to infer valuable and general discourse-structures and (2) Assessing the ability of the model to learn task-independent hidden states, capturing important relationships between instances. [...] Table 1 shows the results on the first task, evaluating our model on RST-style discourse structures from the RST-DT treebank. |
| Researcher Affiliation | Academia | Patrick Huber, Giuseppe Carenini Department of Computer Science, University of British Columbia Vancouver, BC, Canada {huberpat, carenini}@cs.ubc.ca |
| Pseudocode | No | The paper describes the model's components using text and mathematical equations (e.g., equations 1-4), but it does not include a distinct section or figure labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | The RST-DT Treebank published by Carlson, Marcu, and Okurowski (2003) is the most popular RST treebank. [...] The Yelp 13 Dataset by Tang, Qin, and Liu (2015) is a review dataset published as part of the 2013 Yelp Dataset Challenge. |
| Dataset Splits | Yes | The RST-DT Treebank published by Carlson, Marcu, and Okurowski (2003)... split into 344 documents in the training-set and 39 documents in the test-portion. In order to obtain a development set, we subdivide the training-portion into 308 documents for training and 36 documents for a length-stratified development set. [...] The complete dataset contains 335,018 documents in an 80-10-10 datasplit, resulting in 268,014 training documents and 33,502 documents each in the development and test sets. |
| Hardware Specification | Yes | Trained on a Nvidia GTX 1080 Ti GPU with 11GB of memory. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'GloVe word-embedding' but does not specify version numbers for these or any other software components. |
| Experiment Setup | Yes | We train all models using the Adam optimizer (Kingma and Ba 2014) with the standard learning rate of 0.001. ... We train our model on mini-batches of size 20 ... and apply regularization in form of 20% dropout on the input embeddings, the document-level hidden state and the output embeddings ... We clip gradients to a max norm of 2.0 to avoid exploding gradients. Documents are limited to 150 EDUs per document and a maximum of 50 words per EDU... We restrict the vocabulary size to the most frequent 50, 000 words with an additional minimal frequency requirement of 10. We train the sentence- and document-level model for 40 epochs and select the best performing generation on the development set. The hidden dimension of our LSTM modules as well as the pointer component is set to 64... To be able to explore diverse tree candidates in early epochs and further improve them during later epochs, we start with the diversity factor τ = 5 and linearly reduce the parameter to τ = 1... over 3 structure-learning epochs. |