reproducibilityindex.ai

Quantifying the Gain in Weak-to-Strong Generalization

Authors: Moses Charikar, Chirag Pabbaraju, Kirankumar Shiragur

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our theoretical findings through various empirical assessments. ... We validate our characterization of the gain in weak-to-strong generalization through various experiments (Section 5) on synthetic and real-world data.
Researcher Affiliation	Collaboration	Moses Charikar Stanford University moses@cs.stanford.edu Chirag Pabbaraju Stanford University cpabbara@cs.stanford.edu Kirankumar Shiragur Microsoft Research kshiragur@microsoft.com
Pseudocode	No	The paper contains mathematical derivations and descriptions of procedures but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	A Python notebook representative of our main experiment (Figure 2a) is available at https://github.com/chogba/wtsg-regression.
Open Datasets	Yes	We consider three regression datasets: ESOL, Free Solv and Lipop. These datasets are part of the Molecule Net [WRF+18] benchmark suite, and have been curated into train, test and validation splits by Chem Bench [Wan20].
Dataset Splits	Yes	These datasets are part of the Molecule Net [WRF+18] benchmark suite, and have been curated into train, test and validation splits by Chem Bench [Wan20].
Hardware Specification	Yes	All our synthetic experiments were run on a personal Mac Book Pro 2021 with an Apple M1 Pro Chip (10 CPU cores) and no GPUs. ... The experiments on Mol BERT used 2 GPUs with 8 GB memory on an internal GPU cluster.
Software Dependencies	No	The paper mentions using the Adam optimizer [KB14] and models like BERT [DCLT18] but does not specify version numbers for these software components or other libraries used for the experiments.
Experiment Setup	Yes	All the gradient descent optimization procedures (pretraining tasks to obtain hw, hs, weak model finetuning, strong model finetuning on weak labels) used the Adam optimizer [KB14], with a batch size of 32, learning rate of 10 3 and 1000 epochs.