Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks
Authors: Steven Adriaensen, Herilalaina Rakotoarison, Samuel Müller, Frank Hutter
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments aim to test the hypothesis that PFNs present a practical Bayesian approach to learning curve extrapolation. To this end, we first compare our LC-PFN approach against the MCMC approach of Domhan et al. [2015], using the same prior on samples generated from it (Section 4.1). Then, we extend the comparison to four real-world learning curve benchmarks (Section 4.2). Finally, we look beyond the quality of individual extrapolations and evaluate the potential of LC-PFN in the context of predictive early stopping to accelerate model selection (Section 4.3). |
| Researcher Affiliation | Academia | Steven Adriaensen Machine Learning Lab University of Freiburg adriaens@cs.uni-freiburg.de Herilalaina Rakotoarison Machine Learning Lab University of Freiburg rakotoah@cs.uni-freiburg.de Samuel Müller Machine Learning Lab University of Freiburg muellesa@cs.uni-freiburg.de Frank Hutter Machine Learning Lab University of Freiburg fh@cs.uni-freiburg.de |
| Pseudocode | No | The paper describes methodologies in text but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | To facilitate reproducibility and allow others to build on our work, we open-source all code, data, and models used in our experiments at https://github.com/automl/lcpfn. |
| Open Datasets | Yes | Our dataset comprises 20 000 learning curves, sourced from four distinct benchmarks: LCBench [Zimmer et al., 2021], NAS-Bench-201 [Dong and Yang, 2020], Taskset [Metz et al., 2020] and PD1 [Wang et al., 2022], each contributing 5 000 curves, randomly selected from specific subtasks. |
| Dataset Splits | No | The paper describes the internal train/validation/test splits of the deep learning models whose learning curves are studied, but it does not specify train/validation/test splits for the collection of real-world learning curves used to evaluate LC-PFN. Instead, it tests LC-PFN's extrapolation performance on observed partial curves. |
| Hardware Specification | Yes | Overall, reproducing all our experiments in the main paper requires approximately 163 CPU days and 60 GPU hours on our systems (GPU: NVIDIA (R) Ge Force (R) RTX 2080, CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz). |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'cosine annealing', and 'emcee', but does not provide specific version numbers for these or for general programming environments (e.g., Python, PyTorch). |
| Experiment Setup | Yes | LC-PFN architecture and hyperparameters: We use four heads, a hidden size of 1 024, and conduct a thorough ablation study to investigate the effects of the number of layers and embedding size on the final performance, exploring a grid of values (see Table 2). We use a standard training procedure for all experiments, employing the Adam optimizer [Kingma and Ba, 2015] (learning rate 0.0001, batch size 100) with cosine annealing [Loshchilov and Hutter, 2017] with a linear warmup over the first 25% epochs of the training. Finally, we set m = 100, implying LC-PFN is trained for extrapolating sequences of up to 100 training steps (e.g., epochs). Table 2: Grid of hyperparameter values evaluated for MCMC-PP and LC-PFN. MCMC: nsamples [100, 250, 500, 1000, 2000, 4000], nwalkers [26, 50, 100], burn-in [0, 50, 100, 500], thin [1, 10, 100]. PFN: nb_data [100k, 1M, 10M], emsize [128, 256, 512], nlayers [3, 6, 12]. |