reproducibilityindex.ai

Learning Signal-Agnostic Manifolds of Neural Fields

Authors: Yilun Du, Katie Collins, Josh Tenenbaum, Vincent Sitzmann

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the generality of GEM by ﬁrst showing that our model is capable of ﬁtting diverse signal modalities. Next, we demonstrate that our approach captures the underlying structure across these signals; we are not only able to cluster and perceptually interpolate between signals, but inpaint to complete partial ones. Finally, we show that we can draw samples from the learned manifold of each signal type, illustrating the power of GEM to be used a signal-agnostic generative model. We re-iterate that nearly identical architectures and training losses are used across separate modalities. and Quantitative Comparisons. Next, we provide quantitiative evaluations of generations in Table 2 on image and shape modalities.
Researcher Affiliation	Academia	yilundu@mit.edu Katherine Collins1,2 katiemc@mit.edu Joshua B. Tenenbaum1,2,3 jbt@mit.edu Vincent Sitzmann1 sitzmann@mit.edu 1MIT CSAIL 2MIT BCS 3 MIT CBMM
Pseudocode	No	The paper describes algorithms but does not include a dedicated pseudocode block or a clearly labeled 'Algorithm' figure.
Open Source Code	Yes	1Code and additional results are available at https://yilundu.github.io/gem/.
Open Datasets	Yes	Datasets. We evaluate GEM on four signal modalities: image, audio, 3D shape, and cross-modal image and audio signals, respectively. For the image modality, we investigate performance on the Celeb A-HQ dataset [38] ﬁt on 29000 64 64 training celebrity images, and test on 1000 64 64 test images. To study GEM behavior on audio signals, we use the NSynth dataset [39], and ﬁt on a training set of 10000 one-second 16k Hz sounds clips of different instruments playing, and test of 5000 one-second 16k Hz sound clips. ... For the 3D shape domain, we work with the Shape Net dataset from [40]. ... Finally, for the cross-modal image and audio modality, we utilize the cello image and audio recordings from the Sub-URMP dataset [41].
Dataset Splits	No	The paper specifies training and test set sizes but does not explicitly mention a validation set or provide details on how a validation split was created or used (e.g., percentages or specific counts for validation).
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments (e.g., specific GPU or CPU models).
Software Dependencies	No	The paper mentions software like 'Pytorch VAE library' and 'Style GAN2' but does not provide specific version numbers for these libraries or for the underlying framework (e.g., PyTorch version).
Experiment Setup	Yes	We train all approaches with a latent dimension of 1024, and re-scale the size of the VAE to ensure parameter counts are similar. We report model architecture details in the appendix... We use a three-layer multilayer perceptron (MLP) with hidden dimension 512 as our Φ... our hypernetwork is parameterized as a three-layer MLP