Learning to See by Looking at Noise
Authors: Manel Baradad Jurjo, Jonas Wulff, Tongzhou Wang, Phillip Isola, Antonio Torralba
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate performance using Imagenet-100 [60] and the Visual Task Adaptation Benchmark [61].Figures 3 and 4 show the performance for the proposed fully generative methods from noise on Imagenet100 and VTAB (Tables can be found in the Sup.Mat.). |
| Researcher Affiliation | Academia | Manel Baradad MIT CSAIL mbaradad@mit.eduJonas Wulff MIT CSAIL jwulff@csail.mit.eduTongzhou Wang MIT CSAIL tongzhou@mit.eduPhillip Isola MIT CSAIL phillipi@mit.eduAntonio Torralba MIT CSAIL torralba@mit.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a statement about releasing open-source code or a link to a code repository for their methodology. |
| Open Datasets | Yes | We evaluate performance using Imagenet-100 [60] and the Visual Task Adaptation Benchmark [61].As an upper-bound for the maximum expected performance with synthetic images, we consider the same training procedure but using the following real datasets: 1) Places365 [62] ... 2) STL-10 [63] ... 3) Imagenet1k [1] |
| Dataset Splits | Yes | For each of the datasets in VTAB, we fix the number of training and validation samples to 20k at random for the datasets where there are more samples available. |
| Hardware Specification | No | The paper mentions 'computation resources from the Satori cluster donated by IBM to MIT' but does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions models like 'AlexNet-based encoder', 'MoCo v2', and 'StyleGANv2', but does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9'). |
| Experiment Setup | Yes | We generate 105k samples using the proposed image models at 128x128 resolution, which are then downsampled to 96x96 and cropped at random to 64x64 before being fed to the encoder.We fix a common set of hyperparameters for all the methods under test to the values found to perform well by the authors of [58]. |