Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

Authors: Rajat Vadiraj Dwaraknath, Tolga Ergen, Mert Pilanci

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we empirically corroborate our theoretical results via experiments on several datasets.2 We provide additional details in Section B. Figure 1: Plot of objective value of problem (8) which is solved using IRLS (algorithm 1) for a toy 1D dataset with n = 5. The iterates are compared to the optimal value obtained by solving (8) using CVXPY (blue). Table 1: Test accuracies for UCI experiments with 75% 25% training-test split. Our approach achieves either higher or the same accuracy for 26 out of 33 datasets.
Researcher Affiliation Collaboration Rajat Vadiraj Dwaraknath Stanford University rajatvd@stanford.edu Tolga Ergen LG AI Research tergen@lgresearch.ai Mert Pilanci Stanford University pilanci@stanford.edu
Pseudocode Yes Algorithm 1 Iteratively Reweighted Least Squares (IRLS) for gated Re LU and Re LU networks 1: Set iteration count k 0 2: Initialize weights η(0) i 3: Set Φi := Di X, Di DX 4: while not converged and k max iteration count do 5: Solve the weighted ℓ2 regularized least squares problem: i = argmin {wi}i wi 2 2 η(k) i 6: Update the weights: η(k+1) i = r w(k) i 2 + ϵ 7: Increment iteration count: k k + 1 8: end while 9: Optional: Convert the gated Re LU network to a Re LU network (see Section E for details)
Open Source Code No No explicit statement or link for providing open-source code for the methodology was found.
Open Datasets Yes We compare the regularied NTK with our IRLS algorithm (Algorithm 1) on the UCI ML Repository datasets. We follow the procedure described in [43] for n 1000 to extract and standardize the datasets. [55] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
Dataset Splits No The paper states: "For these experiments, we use the 75% 25% ratio for the training and test set splits." This specifies training and test splits but does not mention a validation set, which is required for a 'Yes' answer to this question about training/test/validation splits.
Hardware Specification Yes The experiments were run locally on a Mac Book Air 2020 version with the Apple M1 chip.
Software Dependencies No The paper mentions: "For directly solving the group lasso problem (8), we used CVXPY with the MOSEK solver [54]." and "Specifically, we used the LSQR and LSMR methods as described in [41, 42]." However, it does not provide specific version numbers for these software components (e.g., CVXPY version, MOSEK version, or specific versions of libraries implementing LSQR/LSMR).
Experiment Setup Yes For the black dashed line in Figure 1, we trained a two layer Re LU network with 100 hidden neurons using the standard weight decay regularized objective with gradient descent using a learning rate of 0.01 for 200000 epochs. We use the squared loss and tune the regularization parameter λ for both NTK and our approach by performing a grid search over the set {10 5, 10 4, 10 3, 10 2, 10 1, 100, 101}.