Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Authors: Ruixiang ZHANG, Shuangfei Zhai, Etai Littwin, Joshua M. Susskind
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate NFK in the following settings. We first evaluate the proposed low-rank kernel approximation algorithm (Sec. 3.2), in terms of both approximation accuracy and running time efficiency. Next, we evaluate NFK on various representation learning tasks in both supervised, semi-supervised and unsupervised learning settings. |
| Researcher Affiliation | Collaboration | Ruixiang Zhang Mila, Université de Montréal ruixiang.zhang@umontreal.ca Shuangfei Zhai, Etai Littwin, Josh Susskind Apple Inc. {szhai,elittwin,jsusskind}@apple.com |
| Pseudocode | Yes | Algorithm 1 Baseline method: compute low-rank NFK feature embedding |
| Open Source Code | No | The paper does not provide a specific repository link or an explicit statement about releasing the source code for the methodology described. |
| Open Datasets | Yes | We present our results on CIFAR-10 (Krizhevsky et al., 2009a) in Table. 1. ... We evaluate our method on CIFAR-10 (Krizhevsky et al., 2009a) and SVHN datasets (Krizhevsky et al., 2009b). |
| Dataset Splits | No | The paper mentions using well-known datasets like CIFAR-10 and SVHN but does not explicitly state the train/validation/test dataset splits (e.g., percentages or sample counts) within the main text or appendices for reproducibility. |
| Hardware Specification | No | The paper mentions |
| Software Dependencies | No | The paper mentions using "Jax (Bradbury et al., 2018)", the "neural-tangets (Novak et al., 2020) library", and "sklearn.decomposition.Truncated SVD" but does not specify exact version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For the Neural Fisher Kernel Distillation (NFKD) experiments, ... We run 10 power iterations to compute the SVD approximation of the NFK of the teacher model, to obtain the top-20 eigenvectors and eigenvalues. Then we train the student model with the additional NFKD distillation loss using mini-batch stochastic gradient descent, with 0.9 momentum, for 250 epochs. The initial learning rate begins at 0.1 and we decay the learning rate by 0.1 at 150-th epoch and decay again by 0.1 at 200-th epoch. |