PixelTransformer: Sample Conditioned Signal Generation
Authors: Shubham Tulsiani, Abhinav Gupta
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our approach across three image datasets and show that we learn to generate diverse and meaningful samples, with the distribution variance reducing given more observed pixels. We also show that our approach is applicable beyond images and can allow generating other types of spatial outputs e.g. polynomials, 3D shapes, and videos. |
| Researcher Affiliation | Collaboration | 1Facebook AI Research 2Carnegie Mellon University. |
| Pseudocode | No | The paper describes computational steps but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a project page URL (https://shubhtuls.github.io/PixelTransformer/) but no explicit statement or direct link to a code repository for the methodology described in the paper. |
| Open Datasets | Yes | We examine our approach on three different image datasets CIFAR10 (Krizhevsky, 2009), MNIST (Le Cun et al., 1998), and the Cat Faces (Wu et al., 2020) dataset while using the standard image splits. |
| Dataset Splits | Yes | We examine our approach on three different image datasets CIFAR10 (Krizhevsky, 2009), MNIST (Le Cun et al., 1998), and the Cat Faces (Wu et al., 2020) dataset while using the standard image splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper does not explicitly list software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA 11.x) that were used for the implementation or experiments. |
| Experiment Setup | Yes | We vary the number of observed pixels S randomly between 4 and 2048 (with uniform sampling in log-scale), while the number of query samples Q is set to 2048. During training, the locations x are treated as varying over a continuous domain, using bilinear sampling to obtain the corresponding value this helps our implementation be agnostic to the image resolution in the dataset. While we train a separate network fθ for each dataset, we use the exact same model, hyper-parameters etc. across them. |