reproducibilityindex.ai

Self-Supervised Self-Supervision by Combining Deep Learning and Probabilistic Logic

Authors: Hunter Lang, Hoifung Poon4978-4986

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that S4 is able to automatically propose accurate self-supervision and can often nearly match the accuracy of supervised methods with a tiny fraction of the human effort. We conducted experiments on various natural language processing (NLP) tasks to explore the potential of our method. We held out gold labels for evaluation only, and used them to simulate oracle self-supervision for initial self-supervision and active learning. We ﬁnd that S4 can substantially improve over the seed self-supervision by proposing new virtual evidence, and can match the accuracy of fully supervised systems with a fraction of human effort.
Researcher Affiliation	Collaboration	Hunter Lang,1* Hoifung Poon 2 MIT 1 Microsoft Research 2 hjl@mit.edu, hoifung@microsoft.com
Pseudocode	Yes	Algorithm 1 Self-Supervised Self-Supervision (S4)
Open Source Code	No	The paper does not contain any statement about making its source code publicly available or provide a link to a code repository.
Open Datasets	Yes	We used three standard text classiﬁcation datasets: IMDb (Maas et al. 2011), Stanford Sentiment Treebank (Socher et al. 2013), and Yahoo! Answers (Zhang, Zhao, and Le Cun 2015).
Dataset Splits	Yes	IMDb contains movie reviews with polarity labels (positive/negative). There are 25,000 training instances with equal numbers of positive and negative labels, and the same numbers for test. Stanford Sentiment Treebank (Stan Sent)... It contains 6,920 training instances and 1,821 test instances, with roughly equal split. The Yahoo dataset contains 1.4 million training questions and 60,000 test questions from Yahoo! Answers
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory amounts) used to run the experiments.
Software Dependencies	No	The paper mentions using the 'standard BERT-base model' and the 'Adam optimizer' but does not specify version numbers for these or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	For all virtual evidences, we used initial weight w = 2.2 (the log-odds of 90% probability) and used an α corresponding to an L2 penalty of 5 10 8 on w. Our results are not sensitive to these values. In all experiments, we use the Adam optimizer with an initial learning rate tuned over [0.1, 0.01, 0.001]. The optimizer s history is reset after each EM iteration to remove old gradient information. We always performed 3 EM iterations and trained Ψ for 5 epochs per iteration.