Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks

Authors: Steven Adriaensen, Herilalaina Rakotoarison, Samuel Müller, Frank Hutter

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments aim to test the hypothesis that PFNs present a practical Bayesian approach to learning curve extrapolation. To this end, we first compare our LC-PFN approach against the MCMC approach of Domhan et al. [2015], using the same prior on samples generated from it (Section 4.1). Then, we extend the comparison to four real-world learning curve benchmarks (Section 4.2). Finally, we look beyond the quality of individual extrapolations and evaluate the potential of LC-PFN in the context of predictive early stopping to accelerate model selection (Section 4.3).
Researcher Affiliation Academia Steven Adriaensen Machine Learning Lab University of Freiburg adriaens@cs.uni-freiburg.de Herilalaina Rakotoarison Machine Learning Lab University of Freiburg rakotoah@cs.uni-freiburg.de Samuel Müller Machine Learning Lab University of Freiburg muellesa@cs.uni-freiburg.de Frank Hutter Machine Learning Lab University of Freiburg fh@cs.uni-freiburg.de
Pseudocode No The paper describes methodologies in text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes To facilitate reproducibility and allow others to build on our work, we open-source all code, data, and models used in our experiments at https://github.com/automl/lcpfn.
Open Datasets Yes Our dataset comprises 20 000 learning curves, sourced from four distinct benchmarks: LCBench [Zimmer et al., 2021], NAS-Bench-201 [Dong and Yang, 2020], Taskset [Metz et al., 2020] and PD1 [Wang et al., 2022], each contributing 5 000 curves, randomly selected from specific subtasks.
Dataset Splits No The paper describes the internal train/validation/test splits of the deep learning models whose learning curves are studied, but it does not specify train/validation/test splits for the collection of real-world learning curves used to evaluate LC-PFN. Instead, it tests LC-PFN's extrapolation performance on observed partial curves.
Hardware Specification Yes Overall, reproducing all our experiments in the main paper requires approximately 163 CPU days and 60 GPU hours on our systems (GPU: NVIDIA (R) Ge Force (R) RTX 2080, CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz).
Software Dependencies No The paper mentions software components like 'Adam optimizer', 'cosine annealing', and 'emcee', but does not provide specific version numbers for these or for general programming environments (e.g., Python, PyTorch).
Experiment Setup Yes LC-PFN architecture and hyperparameters: We use four heads, a hidden size of 1 024, and conduct a thorough ablation study to investigate the effects of the number of layers and embedding size on the final performance, exploring a grid of values (see Table 2). We use a standard training procedure for all experiments, employing the Adam optimizer [Kingma and Ba, 2015] (learning rate 0.0001, batch size 100) with cosine annealing [Loshchilov and Hutter, 2017] with a linear warmup over the first 25% epochs of the training. Finally, we set m = 100, implying LC-PFN is trained for extrapolating sequences of up to 100 training steps (e.g., epochs). Table 2: Grid of hyperparameter values evaluated for MCMC-PP and LC-PFN. MCMC: nsamples [100, 250, 500, 1000, 2000, 4000], nwalkers [26, 50, 100], burn-in [0, 50, 100, 500], thin [1, 10, 100]. PFN: nb_data [100k, 1M, 10M], emsize [128, 256, 512], nlayers [3, 6, 12].