On Infinite-Width Hypernetworks
Authors: Etai Littwin, Tomer Galanti, Lior Wolf, Greg Yang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our theory empirically and also demonstrate the utility of this hyperkernel on several functional representation tasks. Our experiments are divided into two main parts. In the first part, we validate the ideas presented in our theoretical analysis and study the effect of the width and depth of g on the optimization of a hypernetwork. In the second part, we evaluate the performance of the NNGP and NTK kernels on image representation tasks. |
| Researcher Affiliation | Collaboration | Etai Littwin School of Computer Science Tel Aviv University Tel Aviv, Israel etai.littwin@gmail.com Tomer Galanti School of Computer Science Tel Aviv University Tel Aviv, Israel tomerga2@tauex.tauex.ac.il Lior Wolf School of Computer Science Tel Aviv University Tel Aviv, Israel wolf@cs.ac.il Greg Yang Microsoft Research AI gregyang@microsoft.com |
| Pseudocode | No | The paper provides mathematical derivations and equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We experimented with the MNIST [18] and CIFAR10 [17] datasets. For each dataset we took 10000 training samples only. |
| Dataset Splits | No | The paper mentions '10000 training samples' and uses a 'test set', but it does not specify explicit percentages or sample counts for training, validation, and test splits, nor does it mention how these splits were performed or if a validation set was used. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions implementing the models and training process but does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The hypernetwork, f, is a fully-connected Re LU neural network of depth 4 and width 200. The primary network g is a fully-connected Re LU neural network of depth {3, 6, 8}. Since the MNIST rotations dataset is simpler, we varied the width of g in {10, 50, 100} and for the the CIFAR10 variation we selected the width of g to be {100, 200, 300}. The network outputs 12 values and is trained using the cross-entropy loss. We trained the hypernetworks for 100 epochs, using the SGD method with batch size 100 and learning rate µ = 0.01. |