Theoretical Analysis of Weak-to-Strong Generalization
Authors: Hunter Lang, David Sontag, Aravindan Vijayaraghavan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments Setup. We explore training linear classifiers on top of the contrastively-fine-tuned Sentence BERT embeddings [52]. As shown in Muennighoff et al. [44], training simple classifiers on top of these complex pretrained representations leads to very competitive performance. We study binary sentiment prediction for movie reviews on the IMDb dataset [41], continuing with the example from Section 3. |
| Researcher Affiliation | Academia | Hunter Lang MIT CSAIL David Sontag MIT CSAIL Aravindan Vijayaraghavan Northwestern University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for reproducing our experimental results is included in the supplemental material. |
| Open Datasets | Yes | We train on the IMDb dataset of movie reviews [41] (Hugging Face Hub ID stanfordnlp/imdb) |
| Dataset Splits | Yes | We train on the IMDb dataset of movie reviews [41] (Hugging Face Hub ID stanfordnlp/imdb), which has 25000 training examples and 25000 test examples, each with exactly 50/50 positive/negative split... and we retrain a model 5 times on 5 different random subsets of the covered training samples, each 80% of the original, and use the other 20% of covered samples as a validation set to perform early stopping with the weak label. |
| Hardware Specification | Yes | We used an internal machine with 4x A100 80GB GPUs to extract all deep network embeddings and to train the linear classifiers on top of those embeddings. |
| Software Dependencies | No | The paper mentions using Adam W optimizer and Sentence BERT embeddings but does not provide specific version numbers for general software dependencies like Python, PyTorch, or TensorFlow libraries. |
| Experiment Setup | Yes | We train the linear classifiers using the Adam W optimizer [40] with global learning rate 0.01 and a weight decay of 0.1, and linear learning rate decay over 500 optimizer steps. |