reproducibilityindex.ai

Towards Learning Universal Hyperparameter Optimizers with Transformers

Authors: Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Richard Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc'Aurelio Ranzato, Sagi Perel, Nando de Freitas

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments demonstrate that the OPTFORMER can simultaneously imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates. Extensive experiments on both public and private datasets demonstrate the OPTFORMER s competitive tuning and generalization abilities. We evaluate mainly on the two natural HPO benchmarks, Real World Data and HPO-B. Section 6 is titled "Experiments".
Researcher Affiliation	Industry	Yutian Chen1, Xingyou Song2, Chansoo Lee2, Zi Wang2, Qiuyi Zhang2, David Dohan2, Kazuya Kawakami1, Greg Kochanski2, Arnaud Doucet1, Marc aurelio Ranzato1, Sagi Perel2, Nando de Freitas1 1Deepmind, 2Google Research, Brain Team
Pseudocode	No	The paper describes the model architecture and processes (e.g., tokenization, inference), but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	Code: https://github.com/google-research/optformer.
Open Datasets	Yes	The emergence of public machine learning data platforms such as Open ML [1] and hyperparameter optimization (HPO) services such as Google Vizier [2]... have made large-scale datasets containing hyperparameter evaluations accessible. In addition, we create two new datasets based on public benchmarks. HPO-B is the largest public benchmark for HPO containing about 1.9K tuning tasks... [5]. For further control over speciﬁc function dimensions and properties, we use the blackbox optimization benchmark BBOB [48]. The datasets generated on public benchmarks, BBOB and HPO-B, can be reproduced by running publicly available HPO algorithms.
Dataset Splits	No	The train/test subsets of Real World Data are split temporally to avoid information leak (see Appendix C for details). The paper does not explicitly specify a validation dataset split (e.g., percentages or counts).
Hardware Specification	Yes	Our model is implemented in T5x [30] and trained on TPU-v4 chips. The full model has 250M parameters and is trained for 2M steps with a batch size of 2048.
Software Dependencies	No	We adopt the T5 Transformer encoder-decoder architecture [30]. The shortened text string is then converted to a sequence of tokens via the Sentence Piece tokenizer [44]. Our model is implemented in T5x [30] and trained on TPU-v4 chips. The paper mentions software tools like T5x and SentencePiece but does not provide specific version numbers for them.
Experiment Setup	Yes	We train a single Transformer model with 250M parameters on the union of the three datasets described above, Real World Data, HPO-B, and BBOB (hyperparameter details in Appendix D.2). The full model has 250M parameters and is trained for 2M steps with a batch size of 2048. We sample M = 100 candidate suggestions from πprior. For the historical sequence h, we convert every DOUBLE and INTEGER parameter along with every function value into a single token, by normalizing and discretizing them into integers, with an quantization level of Q = 1000.