reproducibilityindex.ai

Learning Representation from Neural Fisher Kernel with Low-rank Approximation

Authors: Ruixiang ZHANG, Shuangfei Zhai, Etai Littwin, Joshua M. Susskind

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate NFK in the following settings. We first evaluate the proposed low-rank kernel approximation algorithm (Sec. 3.2), in terms of both approximation accuracy and running time efficiency. Next, we evaluate NFK on various representation learning tasks in both supervised, semi-supervised and unsupervised learning settings.
Researcher Affiliation	Collaboration	Ruixiang Zhang Mila, Université de Montréal ruixiang.zhang@umontreal.ca Shuangfei Zhai, Etai Littwin, Josh Susskind Apple Inc. {szhai,elittwin,jsusskind}@apple.com
Pseudocode	Yes	Algorithm 1 Baseline method: compute low-rank NFK feature embedding
Open Source Code	No	The paper does not provide a specific repository link or an explicit statement about releasing the source code for the methodology described.
Open Datasets	Yes	We present our results on CIFAR-10 (Krizhevsky et al., 2009a) in Table. 1. ... We evaluate our method on CIFAR-10 (Krizhevsky et al., 2009a) and SVHN datasets (Krizhevsky et al., 2009b).
Dataset Splits	No	The paper mentions using well-known datasets like CIFAR-10 and SVHN but does not explicitly state the train/validation/test dataset splits (e.g., percentages or sample counts) within the main text or appendices for reproducibility.
Hardware Specification	No	The paper mentions
Software Dependencies	No	The paper mentions using "Jax (Bradbury et al., 2018)", the "neural-tangets (Novak et al., 2020) library", and "sklearn.decomposition.Truncated SVD" but does not specify exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	For the Neural Fisher Kernel Distillation (NFKD) experiments, ... We run 10 power iterations to compute the SVD approximation of the NFK of the teacher model, to obtain the top-20 eigenvectors and eigenvalues. Then we train the student model with the additional NFKD distillation loss using mini-batch stochastic gradient descent, with 0.9 momentum, for 250 epochs. The initial learning rate begins at 0.1 and we decay the learning rate by 0.1 at 150-th epoch and decay again by 0.1 at 200-th epoch.