Learning Signal-Agnostic Manifolds of Neural Fields
Authors: Yilun Du, Katie Collins, Josh Tenenbaum, Vincent Sitzmann
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the generality of GEM by first showing that our model is capable of fitting diverse signal modalities. Next, we demonstrate that our approach captures the underlying structure across these signals; we are not only able to cluster and perceptually interpolate between signals, but inpaint to complete partial ones. Finally, we show that we can draw samples from the learned manifold of each signal type, illustrating the power of GEM to be used a signal-agnostic generative model. We re-iterate that nearly identical architectures and training losses are used across separate modalities. and Quantitative Comparisons. Next, we provide quantitiative evaluations of generations in Table 2 on image and shape modalities. |
| Researcher Affiliation | Academia | yilundu@mit.edu Katherine Collins1,2 katiemc@mit.edu Joshua B. Tenenbaum1,2,3 jbt@mit.edu Vincent Sitzmann1 sitzmann@mit.edu 1MIT CSAIL 2MIT BCS 3 MIT CBMM |
| Pseudocode | No | The paper describes algorithms but does not include a dedicated pseudocode block or a clearly labeled 'Algorithm' figure. |
| Open Source Code | Yes | 1Code and additional results are available at https://yilundu.github.io/gem/. |
| Open Datasets | Yes | Datasets. We evaluate GEM on four signal modalities: image, audio, 3D shape, and cross-modal image and audio signals, respectively. For the image modality, we investigate performance on the Celeb A-HQ dataset [38] fit on 29000 64 64 training celebrity images, and test on 1000 64 64 test images. To study GEM behavior on audio signals, we use the NSynth dataset [39], and fit on a training set of 10000 one-second 16k Hz sounds clips of different instruments playing, and test of 5000 one-second 16k Hz sound clips. ... For the 3D shape domain, we work with the Shape Net dataset from [40]. ... Finally, for the cross-modal image and audio modality, we utilize the cello image and audio recordings from the Sub-URMP dataset [41]. |
| Dataset Splits | No | The paper specifies training and test set sizes but does not explicitly mention a validation set or provide details on how a validation split was created or used (e.g., percentages or specific counts for validation). |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments (e.g., specific GPU or CPU models). |
| Software Dependencies | No | The paper mentions software like 'Pytorch VAE library' and 'Style GAN2' but does not provide specific version numbers for these libraries or for the underlying framework (e.g., PyTorch version). |
| Experiment Setup | Yes | We train all approaches with a latent dimension of 1024, and re-scale the size of the VAE to ensure parameter counts are similar. We report model architecture details in the appendix... We use a three-layer multilayer perceptron (MLP) with hidden dimension 512 as our Φ... our hypernetwork is parameterized as a three-layer MLP |