Information-Theoretic Local Minima Characterization and Regularization
Authors: Zhiwei Jia, Hao Su
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are performed on CIFAR-10, CIFAR-100 and Image Net for various network architectures. |
| Researcher Affiliation | Academia | 1University of California, San Diego. Correspondence to: Zhiwei Jia <zjia@ucsd.edu>, Hao Su <haosu@eng.ucsd.edu>. |
| Pseudocode | Yes | Algorithm 1 Regularized Gradient Descent |
| Open Source Code | Yes | The code is available at https://github.com/Sean Jia/Info MCR. |
| Open Datasets | Yes | Experiments are performed on CIFAR-10, CIFAR-100 and Image Net for various network architectures. |
| Dataset Splits | Yes | select β by validation via a 45k/5k training data split for each of the network architecture & dataset pair. |
| Hardware Specification | Yes | We benchmark WRN-18 on the down-sampled Image Net classification dataset with 2 Nvidia 2080 Ti GPUs and a batch size of 128. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' for implementation details, but it does not specify version numbers for any software dependencies, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | For the three hyper-parameters α, β, M in our proposed Algorithm 1, we find α and M quite robust and manually set α = 0.0001, M = 8 in all experiments and select β by validation via a 45k/5k training data split for each of the network architecture & dataset pair. In specific, we consider β {1, 5, 10, 20, 30, 40, 50, 75, 100}. We keep all the other training hyper-parameters, schemes as well as the setup identical to those in their original paper whenever possible (details in Appendix E). |