Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Authors: Avrajit Ghosh, Soo Min Kwon, Rongrong Wang, Saiprasad Ravishankar, Qing Qu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments to support our theory, along with examples demonstrating how these phenomena occur in nonlinear networks and how they differ from those which have benign landscape such as in DLNs. (...) 4 EXPERIMENTAL RESULTS
Researcher Affiliation Academia Avrajit Ghosh1 , Soo Min Kwon2 , Rongrong Wang1, Saiprasad Ravishankar1, Qing Qu2 1 Michigan State University, 2 University of Michigan Equal contribution; Correspondence to EMAIL; EMAIL
Pseudocode No The paper describes the methodology and analyses using mathematical equations and textual explanations, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes For the regression task, we minimize the loss L(Θ) = G(Θ) yimage 2 2, where G(Θ) is a UNet parameterized by Θ, and yimage denotes one of the images in Figure 10b. (...) we train a 2-layer fully connected neural network on N labeled training images from the CIFAR-10 dataset using MSE loss and plot the sharpness in Figure 10c. (...) on a 5K subset of the MNIST dataset, following Cohen et al. (2021). (...) on a subsampled 20K set on MNIST and CIFAR-10.
Dataset Splits No The paper mentions using a '5K subset of the MNIST dataset' and 'a subsampled 20K set on MNIST and CIFAR-10' and 'N labeled training images from the CIFAR-10 dataset', but it does not provide specific training/test/validation splits (e.g., percentages or exact sample counts for each split).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions optimizers like SGD and architectures like UNet and MLP, but it does not provide specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes Here, η > 2/β corresponds to the EOS regime... For the DLN, we consider a 3-layer network... with an initialization scale of α = 0.01. (...) we train a 2-layer fully connected neural network on N labeled training images from the CIFAR-10 dataset using MSE loss and plot the sharpness in Figure 10c. (...) a 3-layer MLP without bias terms for the weights, with each hidden layer consisting of 1000 units. The network is trained using MSE loss with a learning rate of η = 4, along with random weights scaled by α = 0.01 and full-batch gradient descent on a 5K subset of the MNIST dataset, following Cohen et al. (2021). (...) we consider a 4-layer MLP with Re LU activations with hidden layer size in each unit of 200 for classification on a subsampled 20K set on MNIST and CIFAR-10.