On the Parameterization of Second-Order Optimization Effective towards the Infinite Width
Authors: Satoki Ishikawa, Ryo Karakida
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify the effectiveness of our proposed parameterization in the training of various neural networks. In particular, it enables us to transfer optimal learning rates and damping terms from narrow models to wider ones (in Section 5.2, 5.3). |
| Researcher Affiliation | Academia | Satoki Ishikawa Department of Computer Science Tokyo Institute of Technology, Japan riverstone@rio.gsic.titech.ac.jp Ryo Karakida Artificial Intelligence Research Center AIST, Japan karakida.ryo@aist.go.jp |
| Pseudocode | No | The paper contains mathematical derivations and descriptions of methods but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "In all experiments, we implemented second-order optimization based on the ASDL library (Osawa et al., 2023a)." However, it does not provide concrete access to the source code for the specific methodology described in this paper, nor a general statement about code release. |
| Open Datasets | Yes | Figure 1 In the upper graph, we trained a 3-layer MLP on the MNIST dataset... In the second graph, we trained a Myrtle-5 on the CIFAR10... Figure 4 (Left) We trained CBOW on Wiki Text2... (Right) We trained Res Net18 on CIFAR100... Figure 5 We trained Res Net50 on Image Net... |
| Dataset Splits | No | The paper mentions reducing dataset size for some experiments (e.g., "The training sets have been reduced to 256 samples", "The number of samples is reduced to 1024") and "Validation accuracy is highest..." but does not provide specific details on the proportions or counts of training, validation, and test splits for reproducibility. |
| Hardware Specification | No | The paper does not explicitly describe the hardware specifications (e.g., specific GPU/CPU models, memory amounts) used to run the experiments. |
| Software Dependencies | No | The paper states: "In all experiments, we implemented second-order optimization based on the ASDL library (Osawa et al., 2023a)." However, it does not provide specific version numbers for ASDL, PyTorch, or any other key software dependencies. |
| Experiment Setup | Yes | Section B.2 "DETAILS OF FIGURES" provides extensive experimental setup details, including learning rates (e.g., "η = 0.001"), damping terms (e.g., "ρ = 1"), data augmentation techniques (e.g., "Random Crop, Random Horizontal Flip, Auto Augment, and Cutout"), and loss functions ("cross-entropy loss with label smoothing"). |