Towards Understanding Generalization via Decomposing Excess Risk Dynamics
Authors: Jiaye Teng, Jianhao Ma, Yang Yuan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on neural networks verify the utility of the decomposition framework. (...) We conduct several experiments on both synthetic datasets and real-world datasets (MINIST, CIFAR-10) to validate the utility of the decomposition framework |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Department of Industrial and Operational Engineering, University of Michigan, Ann Arbor 3Shanghai Qi Zhi Institute |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | No statements regarding the availability of open-source code for the described methodology were found in the paper. |
| Open Datasets | Yes | Experiments on neural networks on both synthetic and real-world datasets (MINIST, CIFAR-10) |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets but does not explicitly provide specific training/validation/test splits (e.g., percentages or counts) or reference standard splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions optimizers like SGD, Adam, and Rprop, and neural network types (Re LU, CNN), but does not provide specific software package names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | In this experiment, we fix the width of each layer to be 64, and use SGD as the optimizer with Gaussian initialization (N(0, σ2) where σ = 1 10 3) and stepsize η = 1 10 2. (...) Adam (stepsize η = 0.002, (β1, β2) = (0.9, 0.999), ϵ = 1e 08, no weight decay), Rprop (learning rate η = 5 10 4, (η1, η2) = (0.5, 1.2), stepsizes (1 10 6, 50)). |