Efficient Non-Parametric Optimizer Search for Diverse Tasks

Authors: Ruochen Wang, Yuanhao Xiong, Minhao Cheng, Cho-Jui Hsieh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate the proposed framework on a suite of tasks, covering a variety of models and datasets.
Researcher Affiliation Academia Ruochen Wang1, Yuanhao Xiong1, Minhao Cheng2, Cho-Jui Hsieh1 1Department of Computer Science, UCLA, 2HKUST
Pseudocode Yes Algorithm 1 in the Appendix provides a detailed summary of the complete search process.
Open Source Code Yes Our code is publicly available at https://github.com/ruocwang/enos.
Open Datasets Yes We extensively evaluate the proposed framework on a diverse set of learning tasks: digit classification with MNISTNET [10], image classification with Conv Net [15], graph learning with (Cluster-)GAT [21, 28], norm-bounded adversarial attack on robustly trained models [20, 45, 46], and BERT fine-tuning on NLP datasets [34, 56].
Dataset Splits No The paper uses well-known datasets (e.g., MNIST, CIFAR-10) which have standard splits, but it does not explicitly state the specific train/validation/test splits (e.g., percentages or sample counts) within the paper.
Hardware Specification Yes Under this setting, our method finishes in 0.92h on RTX 2080ti, much faster than L2LGD2 (2.62h).
Software Dependencies No The paper mentions using Hugging Face implementations and Python, but does not specify exact software libraries with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x, etc.) or their specific versions.
Experiment Setup Yes Across all experiments, we limit the maximum level of MCT traversal to 4, and set the number of Monte Carlo samples to 32 (a multiple of 8 for parallelism on 8-GPU servers) for each level. This amounts to a fixed total budget of 128 evaluations. The maximum depth for the super-tree is set to 10..." and "The batch size is set to 32." and "we set = 8/255, and run each optimizer once for 100 steps on every image from the test split [46].