Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
No Training Required: Exploring Random Encoders for Sentence Classification
Authors: John Wieting, Douwe Kiela
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore various methods for computing sentence representations from pretrained word embeddings without any training, i.e., using nothing but random parameterizations. In our experiments, we evaluate on a standard sentence representation benchmark using Sent Eval (Conneau & Kiela, 2018). |
| Researcher Affiliation | Collaboration | John Wieting Carnegie Mellon University EMAIL Douwe Kiela Facebook AI Research EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/facebookresearch/randsent. |
| Open Datasets | Yes | We use the publicly available 300-dimensional Glo Ve embeddings (Pennington et al., 2014) trained on Common Crawl for all experiments. The set of downstream tasks we use for evaluation comprises sentiment analysis (MR, SST), question-type (TREC), product reviews (CR), subjectivity (SUBJ), opinion polarity (MPQA), paraphrasing (MRPC), entailment (SICK-E, SNLI) and semantic relatedness (SICK-R, STSB). The probing tasks consist of those in Conneau et al. (2018). |
| Dataset Splits | Yes | We compute the average accuracy/Pearson s r, along with the standard deviation, over 5 different seeds for the random methods, and tune on validation for each task. Training is stopped when validation performance has not increased 5 times. Checks for validation performance occur every 4 epochs. |
| Hardware Specification | No | The paper mentions 'fit things onto a modern GPU' but does not provide specific details about the hardware used for experiments, such as GPU/CPU models, processors, or memory. |
| Software Dependencies | No | The paper mentions using 'Sent Eval' and 'Adam' for optimization but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We use the default Sent Eval settings, which are to train with a logistic regression classifier, use a batch size of 64, a maximum number of epochs of 200 with early stopping, no dropout, and use Adam (Kingma & Ba, 2014) for optimization with a learning rate of 0.001. For the ESNs, we only tune whether to use a Re LU or no activation function, the spectral radius from {0.4, 0.6, 0.8, 1.0}, the range of the uniform distribution for initializing W i where the max distance from zero is selected from {0.01, 0.05, 0.1, 0.2}, and finally the fraction of elements in W h that are set to 0, i.e., sparsity, is selected from {0, 0.25, 0.5, 0.75}. |