Procedural Image Programs for Representation Learning
Authors: Manel Baradad, Richard Chen, Jonas Wulff, Tongzhou Wang, Rogerio Feris, Antonio Torralba, Phillip Isola
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test downstream performance when pre-training with Shaders-1k and Shaders-21k, we perform an initial set of experiments with and without latent class supervision. We test three representation learning methodologies: supervised classification with cross-entropy loss (CE), supervised contrastive learning (Sup Con) [27], and unsupervised representation learning (Sim CLR) [14]. |
| Researcher Affiliation | Collaboration | Manel Baradad1, Chun-Fu (Richard) Chen2, Jonas Wulff3, Tongzhou Wang1, Rogerio Feris4, Antonio Torralba1, Phillip Isola1 1MIT CSAIL, 2JPMorgan Chase Bank, N.A., 3Xyla, Inc, 4MIT-IBM Watson AI Lab |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code, models, and datasets are available at: https: //github.com/mbaradad/shaders21k |
| Open Datasets | Yes | Code, models, and datasets are available at: https: //github.com/mbaradad/shaders21k |
| Dataset Splits | No | We train for 200 epochs with a batch size of 256, with the rest of the hyperparameters set to those found to work well for Image Net-100 in the original paper. The images after augmentations are fed to the network at a resolution of 64 64. After training the encoder, we evaluate using the linear protocol described in [17], which consists of only training a linear classifier on top of the representations learned. We evaluate on Image Net-100, a subset of Image Net-1k [8] defined in [28] using the averaged pooled features from the last convolutional layer, which have a dimensionality of 512. |
| Hardware Specification | Yes | Combining all the shaders from both sources, images can be rendered at 979 frames per second at a resolution of 384 384, using a single modern GPU and including transfer to general memory. When stored to disk as JPEG, the average size per image is 54 k B, compared to 70 k B for Image Net-1k when resized to the same resolution. and rendering time is computed using a single Nvidia Ge Force GTX TITAN X, including transfer to general memory. and Experiments were partially conducted using computation resources from the Satori cluster donated by IBM to MIT, and the MIT s Supercloud cluster. |
| Software Dependencies | No | The generative programs of this collection are coded in a common language, the Open GL shading language, that encapsulates the image generation process in a few lines of code. and We follow the training procedure and public implementation described in [27] for the three methods. |
| Experiment Setup | Yes | We train for 200 epochs with a batch size of 256, with the rest of the hyperparameters set to those found to work well for Image Net-100 in the original paper. The images after augmentations are fed to the network at a resolution of 64 64. and In this setting, we train a Res Net-50 using Mo Co v2 [18], with images generated at 384 384 resolution. We train for 200 epochs with 1.3M images, with a batch size of 256, and set the rest of the hyperparameters to those found to work best on Image Net-1k in the original paper. and For the Mix Up strategy, we found that mixing 6 frames with weights sampled from a Dirichlet distribution with αi = 1 yields the best FID, while rendering time is still affordable. |