On the Parameterization of Second-Order Optimization Effective towards the Infinite Width

Authors: Satoki Ishikawa, Ryo Karakida

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically verify the effectiveness of our proposed parameterization in the training of various neural networks. In particular, it enables us to transfer optimal learning rates and damping terms from narrow models to wider ones (in Section 5.2, 5.3).
Researcher Affiliation Academia Satoki Ishikawa Department of Computer Science Tokyo Institute of Technology, Japan riverstone@rio.gsic.titech.ac.jp Ryo Karakida Artificial Intelligence Research Center AIST, Japan karakida.ryo@aist.go.jp
Pseudocode No The paper contains mathematical derivations and descriptions of methods but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper states: "In all experiments, we implemented second-order optimization based on the ASDL library (Osawa et al., 2023a)." However, it does not provide concrete access to the source code for the specific methodology described in this paper, nor a general statement about code release.
Open Datasets Yes Figure 1 In the upper graph, we trained a 3-layer MLP on the MNIST dataset... In the second graph, we trained a Myrtle-5 on the CIFAR10... Figure 4 (Left) We trained CBOW on Wiki Text2... (Right) We trained Res Net18 on CIFAR100... Figure 5 We trained Res Net50 on Image Net...
Dataset Splits No The paper mentions reducing dataset size for some experiments (e.g., "The training sets have been reduced to 256 samples", "The number of samples is reduced to 1024") and "Validation accuracy is highest..." but does not provide specific details on the proportions or counts of training, validation, and test splits for reproducibility.
Hardware Specification No The paper does not explicitly describe the hardware specifications (e.g., specific GPU/CPU models, memory amounts) used to run the experiments.
Software Dependencies No The paper states: "In all experiments, we implemented second-order optimization based on the ASDL library (Osawa et al., 2023a)." However, it does not provide specific version numbers for ASDL, PyTorch, or any other key software dependencies.
Experiment Setup Yes Section B.2 "DETAILS OF FIGURES" provides extensive experimental setup details, including learning rates (e.g., "η = 0.001"), damping terms (e.g., "ρ = 1"), data augmentation techniques (e.g., "Random Crop, Random Horizontal Flip, Auto Augment, and Cutout"), and loss functions ("cross-entropy loss with label smoothing").