A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING
Authors: Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on 3 different tasks: author profiling, sentiment classification and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks. |
| Researcher Affiliation | Collaboration | IBM Watson Montreal Institute for Learning Algorithms (MILA), Universit e de Montr eal CIFAR Senior Fellow lin.zhouhan@gmail.com {mfeng, cicerons, yum, bingxia, zhou}@us.ibm.com |
| Pseudocode | No | The paper does not contain explicit 'Pseudocode' or 'Algorithm' blocks. It describes the model with equations and diagrams but not structured code-like steps. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | The Author Profiling dataset^1 consists of Twitter tweets in English, Spanish, and Dutch. (footnote 1 points to http://pan.webis.de/clef16/pan16-web/author-profiling.html) and We choose the Yelp dataset^2 for sentiment analysis task. (footnote 2 points to https://www.yelp.com/datasetchallenge) and We use the biggest dataset in textual entailment, the SNLI corpus (Bowman et al., 2015) |
| Dataset Splits | Yes | We randomly selected 68485 tweets as training set, 4000 for development set, and 4000 for test set. and We randomly select 500K review-star pairs as training set, and 2000 for development set, 2000 for test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments, only general setup parameters. |
| Software Dependencies | No | The paper mentions 'Theano' and 'Lasagne' and 'Stanford tokenizer' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | During training we use 0.5 dropout on the MLP and 0.0001 L2 regularization. We use stochastic gradient descent as the optimizer, with a learning rate of 0.06, batch size 16. and our self-attention MLP has a hidden layer with 350 units (the da in Section 2), we choose the matrix embedding to have 30 rows (the r), and a coefficient of 1 for the penalization term. |