Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING
Authors: Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on 3 different tasks: author profiling, sentiment classification and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks. |
| Researcher Affiliation | Collaboration | IBM Watson Montreal Institute for Learning Algorithms (MILA), Universit e de Montr eal CIFAR Senior Fellow EMAIL EMAIL |
| Pseudocode | No | The paper does not contain explicit 'Pseudocode' or 'Algorithm' blocks. It describes the model with equations and diagrams but not structured code-like steps. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | The Author Profiling dataset^1 consists of Twitter tweets in English, Spanish, and Dutch. (footnote 1 points to http://pan.webis.de/clef16/pan16-web/author-profiling.html) and We choose the Yelp dataset^2 for sentiment analysis task. (footnote 2 points to https://www.yelp.com/datasetchallenge) and We use the biggest dataset in textual entailment, the SNLI corpus (Bowman et al., 2015) |
| Dataset Splits | Yes | We randomly selected 68485 tweets as training set, 4000 for development set, and 4000 for test set. and We randomly select 500K review-star pairs as training set, and 2000 for development set, 2000 for test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments, only general setup parameters. |
| Software Dependencies | No | The paper mentions 'Theano' and 'Lasagne' and 'Stanford tokenizer' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | During training we use 0.5 dropout on the MLP and 0.0001 L2 regularization. We use stochastic gradient descent as the optimizer, with a learning rate of 0.06, batch size 16. and our self-attention MLP has a hidden layer with 350 units (the da in Section 2), we choose the matrix embedding to have 30 rows (the r), and a coefficient of 1 for the penalization term. |