Self-Supervised Self-Supervision by Combining Deep Learning and Probabilistic Logic
Authors: Hunter Lang, Hoifung Poon4978-4986
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that S4 is able to automatically propose accurate self-supervision and can often nearly match the accuracy of supervised methods with a tiny fraction of the human effort. We conducted experiments on various natural language processing (NLP) tasks to explore the potential of our method. We held out gold labels for evaluation only, and used them to simulate oracle self-supervision for initial self-supervision and active learning. We find that S4 can substantially improve over the seed self-supervision by proposing new virtual evidence, and can match the accuracy of fully supervised systems with a fraction of human effort. |
| Researcher Affiliation | Collaboration | Hunter Lang,1* Hoifung Poon 2 MIT 1 Microsoft Research 2 hjl@mit.edu, hoifung@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Self-Supervised Self-Supervision (S4) |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available or provide a link to a code repository. |
| Open Datasets | Yes | We used three standard text classification datasets: IMDb (Maas et al. 2011), Stanford Sentiment Treebank (Socher et al. 2013), and Yahoo! Answers (Zhang, Zhao, and Le Cun 2015). |
| Dataset Splits | Yes | IMDb contains movie reviews with polarity labels (positive/negative). There are 25,000 training instances with equal numbers of positive and negative labels, and the same numbers for test. Stanford Sentiment Treebank (Stan Sent)... It contains 6,920 training instances and 1,821 test instances, with roughly equal split. The Yahoo dataset contains 1.4 million training questions and 60,000 test questions from Yahoo! Answers |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory amounts) used to run the experiments. |
| Software Dependencies | No | The paper mentions using the 'standard BERT-base model' and the 'Adam optimizer' but does not specify version numbers for these or any other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For all virtual evidences, we used initial weight w = 2.2 (the log-odds of 90% probability) and used an α corresponding to an L2 penalty of 5 10 8 on w. Our results are not sensitive to these values. In all experiments, we use the Adam optimizer with an initial learning rate tuned over [0.1, 0.01, 0.001]. The optimizer s history is reset after each EM iteration to remove old gradient information. We always performed 3 EM iterations and trained Ψ for 5 epochs per iteration. |