Information-Theoretic Local Minima Characterization and Regularization

Authors: Zhiwei Jia, Hao Su

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are performed on CIFAR-10, CIFAR-100 and Image Net for various network architectures.
Researcher Affiliation Academia 1University of California, San Diego. Correspondence to: Zhiwei Jia <zjia@ucsd.edu>, Hao Su <haosu@eng.ucsd.edu>.
Pseudocode Yes Algorithm 1 Regularized Gradient Descent
Open Source Code Yes The code is available at https://github.com/Sean Jia/Info MCR.
Open Datasets Yes Experiments are performed on CIFAR-10, CIFAR-100 and Image Net for various network architectures.
Dataset Splits Yes select β by validation via a 45k/5k training data split for each of the network architecture & dataset pair.
Hardware Specification Yes We benchmark WRN-18 on the down-sampled Image Net classification dataset with 2 Nvidia 2080 Ti GPUs and a batch size of 128.
Software Dependencies No The paper mentions 'Tensor Flow' for implementation details, but it does not specify version numbers for any software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup Yes For the three hyper-parameters α, β, M in our proposed Algorithm 1, we find α and M quite robust and manually set α = 0.0001, M = 8 in all experiments and select β by validation via a 45k/5k training data split for each of the network architecture & dataset pair. In specific, we consider β {1, 5, 10, 20, 30, 40, 50, 75, 100}. We keep all the other training hyper-parameters, schemes as well as the setup identical to those in their original paper whenever possible (details in Appendix E).