PixelTransformer: Sample Conditioned Signal Generation

Authors: Shubham Tulsiani, Abhinav Gupta

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our approach across three image datasets and show that we learn to generate diverse and meaningful samples, with the distribution variance reducing given more observed pixels. We also show that our approach is applicable beyond images and can allow generating other types of spatial outputs e.g. polynomials, 3D shapes, and videos.
Researcher Affiliation Collaboration 1Facebook AI Research 2Carnegie Mellon University.
Pseudocode No The paper describes computational steps but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper provides a project page URL (https://shubhtuls.github.io/PixelTransformer/) but no explicit statement or direct link to a code repository for the methodology described in the paper.
Open Datasets Yes We examine our approach on three different image datasets CIFAR10 (Krizhevsky, 2009), MNIST (Le Cun et al., 1998), and the Cat Faces (Wu et al., 2020) dataset while using the standard image splits.
Dataset Splits Yes We examine our approach on three different image datasets CIFAR10 (Krizhevsky, 2009), MNIST (Le Cun et al., 1998), and the Cat Faces (Wu et al., 2020) dataset while using the standard image splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper does not explicitly list software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA 11.x) that were used for the implementation or experiments.
Experiment Setup Yes We vary the number of observed pixels S randomly between 4 and 2048 (with uniform sampling in log-scale), while the number of query samples Q is set to 2048. During training, the locations x are treated as varying over a continuous domain, using bilinear sampling to obtain the corresponding value this helps our implementation be agnostic to the image resolution in the dataset. While we train a separate network fθ for each dataset, we use the exact same model, hyper-parameters etc. across them.