Neural Functional Transformers

Authors: Allan Zhou, Kaien Yang, Yiding Jiang, Kaylee Burns, Winnie Xu, Samuel Sokota, J. Zico Kolter, Chelsea Finn

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments processing the weights of feedforward MLPs and CNNs, we find that NFTs match or exceed the performance of prior weight-space methods. We also leverage NFTs to develop INR2ARRAY, a novel method for computing permutation invariant latent representations from the weights of implicit neural representations (INRs). Our proposed method improves INR classification accuracy by up to +17% over existing methods.
Researcher Affiliation Academia 1Stanford University 2Carnegie Mellon University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks that are clearly labeled as such.
Open Source Code Yes We provide an implementation of our layers at https://github.com/Allan Yang Zhou/nfn.
Open Datasets Yes For our INR experiments, we construct datasets for MNIST [25], Fashion MNIST [42], and CIFAR-10 [23] following the procedure of Zhou et al. [46] exactly.
Dataset Splits Yes We use 20% of the non-test data for validation, and train on the remaining 12000. We split the SIRENs of each dataset into training, validation, and testing sets, and train the encoder and decoder using only the training set. We use the validation error to sweep over the following hyperparameters (for both the NFT and NFN variants): # blocks (4 vs 6), # channels (256 vs 512), MLP hidden dim (512 vs 1024), Fourier scale (3 vs 10), # attn heads (4 vs 8), and dropout (0.1 vs 0). After sweeping these hyperparameters, we use the best hyperparameter configuration to train the encoder and decoder and do early stopping with the validation error.
Hardware Specification No The paper mentions training times (e.g., 5H, 2H, 13H) in Appendix B, but it does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions optimizers like 'Adam' and 'Adam W' and libraries/architectures such as 'Transformer classification head', but it does not specify any software dependencies with version numbers.
Experiment Setup Yes Hyperparameters Predicting gen. Editing INRs INR2ARRAY # blocks 4 4 6 # channels 256 256 256 MLP hidden dim 1024 512 1024 Fourier scale 3 3 3 Fourier size 128 128 128 # attn heads 4 8 4 dropout p 0.1 0.1 0 invariant layer HNP CA CA M 16 CA dim 256 Optimizer Adam Adam Adam W Learning rate 0.001 0.001 0.0001 Weight decay coeff 0 0 0.01 LR warmup steps 10K 10K 10K Total params 46M 7M 22M Training steps 75K 50K 200K Training time 5H 2H 13H