Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs
Authors: Rajat Vadiraj Dwaraknath, Tolga Ergen, Mert Pilanci
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we empirically corroborate our theoretical results via experiments on several datasets.2 We provide additional details in Section B. Figure 1: Plot of objective value of problem (8) which is solved using IRLS (algorithm 1) for a toy 1D dataset with n = 5. The iterates are compared to the optimal value obtained by solving (8) using CVXPY (blue). Table 1: Test accuracies for UCI experiments with 75% 25% training-test split. Our approach achieves either higher or the same accuracy for 26 out of 33 datasets. |
| Researcher Affiliation | Collaboration | Rajat Vadiraj Dwaraknath Stanford University rajatvd@stanford.edu Tolga Ergen LG AI Research tergen@lgresearch.ai Mert Pilanci Stanford University pilanci@stanford.edu |
| Pseudocode | Yes | Algorithm 1 Iteratively Reweighted Least Squares (IRLS) for gated Re LU and Re LU networks 1: Set iteration count k 0 2: Initialize weights η(0) i 3: Set Φi := Di X, Di DX 4: while not converged and k max iteration count do 5: Solve the weighted ℓ2 regularized least squares problem: i = argmin {wi}i wi 2 2 η(k) i 6: Update the weights: η(k+1) i = r w(k) i 2 + ϵ 7: Increment iteration count: k k + 1 8: end while 9: Optional: Convert the gated Re LU network to a Re LU network (see Section E for details) |
| Open Source Code | No | No explicit statement or link for providing open-source code for the methodology was found. |
| Open Datasets | Yes | We compare the regularied NTK with our IRLS algorithm (Algorithm 1) on the UCI ML Repository datasets. We follow the procedure described in [43] for n 1000 to extract and standardize the datasets. [55] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. |
| Dataset Splits | No | The paper states: "For these experiments, we use the 75% 25% ratio for the training and test set splits." This specifies training and test splits but does not mention a validation set, which is required for a 'Yes' answer to this question about training/test/validation splits. |
| Hardware Specification | Yes | The experiments were run locally on a Mac Book Air 2020 version with the Apple M1 chip. |
| Software Dependencies | No | The paper mentions: "For directly solving the group lasso problem (8), we used CVXPY with the MOSEK solver [54]." and "Specifically, we used the LSQR and LSMR methods as described in [41, 42]." However, it does not provide specific version numbers for these software components (e.g., CVXPY version, MOSEK version, or specific versions of libraries implementing LSQR/LSMR). |
| Experiment Setup | Yes | For the black dashed line in Figure 1, we trained a two layer Re LU network with 100 hidden neurons using the standard weight decay regularized objective with gradient descent using a learning rate of 0.01 for 200000 epochs. We use the squared loss and tune the regularization parameter λ for both NTK and our approach by performing a grid search over the set {10 5, 10 4, 10 3, 10 2, 10 1, 100, 101}. |