Quantifying the Gain in Weak-to-Strong Generalization
Authors: Moses Charikar, Chirag Pabbaraju, Kirankumar Shiragur
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical findings through various empirical assessments. ... We validate our characterization of the gain in weak-to-strong generalization through various experiments (Section 5) on synthetic and real-world data. |
| Researcher Affiliation | Collaboration | Moses Charikar Stanford University moses@cs.stanford.edu Chirag Pabbaraju Stanford University cpabbara@cs.stanford.edu Kirankumar Shiragur Microsoft Research kshiragur@microsoft.com |
| Pseudocode | No | The paper contains mathematical derivations and descriptions of procedures but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | A Python notebook representative of our main experiment (Figure 2a) is available at https://github.com/chogba/wtsg-regression. |
| Open Datasets | Yes | We consider three regression datasets: ESOL, Free Solv and Lipop. These datasets are part of the Molecule Net [WRF+18] benchmark suite, and have been curated into train, test and validation splits by Chem Bench [Wan20]. |
| Dataset Splits | Yes | These datasets are part of the Molecule Net [WRF+18] benchmark suite, and have been curated into train, test and validation splits by Chem Bench [Wan20]. |
| Hardware Specification | Yes | All our synthetic experiments were run on a personal Mac Book Pro 2021 with an Apple M1 Pro Chip (10 CPU cores) and no GPUs. ... The experiments on Mol BERT used 2 GPUs with 8 GB memory on an internal GPU cluster. |
| Software Dependencies | No | The paper mentions using the Adam optimizer [KB14] and models like BERT [DCLT18] but does not specify version numbers for these software components or other libraries used for the experiments. |
| Experiment Setup | Yes | All the gradient descent optimization procedures (pretraining tasks to obtain hw, hs, weak model finetuning, strong model finetuning on weak labels) used the Adam optimizer [KB14], with a batch size of 32, learning rate of 10 3 and 1000 epochs. |