Hyperparameter Optimization Is Deceiving Us, and How to Stop It
Authors: A. Feder Cooper, Yucheng Lu, Jessica Forde, Christopher M. De Sa
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our framework enables us to prove EHPO methods that are guaranteed to be defended against deception, given bounded compute time budget t. We demonstrate our framework s utility by proving and empirically validating a defended variant of random search. Validating our defense empirically and selecting hyper-HPs. Any defense ultimately depends on the hyper-HPs it uses. |
| Researcher Affiliation | Academia | A. Feder Cooper Cornell University afc78@cornell.edu Yucheng Lu Cornell University yl2967@cornell.edu Jessica Zosa Forde Brown University jforde2@cs.brown.edu Christopher De Sa Cornell University cdesa@cs.cornell.edu |
| Pseudocode | Yes | Algorithm 1 Defense with Random Search |
| Open Source Code | Yes | All code can be found at https://github.com/pasta41/deception. |
| Open Datasets | Yes | We first reproduce Wilson et al. [72], in which the authors trained VGG16 with different optimizers on CIFAR-10 (Figure 1a). |
| Dataset Splits | No | The paper mentions 'usually split into train and validation sets' generally, and states 'The input dataset X can be split in various ways, as a function of the random seed r.' However, it does not provide specific percentages, counts, or explicit methods for the validation split used in its experiments. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as GPU models, CPU types, or cloud computing instance specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., 'Python 3.x,' 'PyTorch 1.x,' 'CUDA x.x') that would be necessary to replicate the experiments. |
| Experiment Setup | Yes | We change the hyper-HPs, shifting the distribution until Adam s performance starts to degrade, and use the resulting hyper-HPs ( [1010, 1012]) to run our defense (Appendix). We now run a modified version of our defended EHPO in Definition 7, described in Algorithm 1, with K R = 600 (200 logs for each optimizer). Using a budget of M = 10000 iterations, we subsample κ = 11 logs. |