Image Transformer
Authors: Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While conceptually simple, our generative models significantly outperform the current state of the art in image generation on Image Net, improving the best published negative log-likelihood on Image Net from 3.83 to 3.77. We also present results on image super-resolution with a large magnification ratio, applying an encoder-decoder configuration of our architecture. In a human evaluation study, we find that images generated by our super-resolution model fool human observers three times more often than the previous state of the art. |
| Researcher Affiliation | Collaboration | 1Google Brain, Mountain View, USA 2Department of Electrical Engineering and Computer Sciences, University of California, Berkeley 3Work done during an internship at Google Brain 4Google AI, Mountain View, USA. |
| Pseudocode | No | The paper includes a diagram (Figure 1) illustrating a slice of the Image Transformer and equations describing operations, but it does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | All code we used to develop, train, and evaluate our models is available in Tensor2Tensor (Vaswani et al., 2018). |
| Open Datasets | Yes | modeling images from the standard Image Net data set, as measured by log-likelihood. |
| Dataset Splits | Yes | Table 4. Bits/dim on CIFAR-10 test and Image Net validation sets. The Image Transformer outperforms all models and matches Pixel CNN++, achieving a new state-of-the-art on Image Net. |
| Hardware Specification | Yes | We train our models on both p100 and k40 GPUs, with batch sizes ranging from 1 to 8 per GPU. |
| Software Dependencies | No | The paper mentions 'TensorFlow' for image resizing and 'Tensor2Tensor' as the framework where the code is available, but it does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For categorical, we use 12 layers with d = 512, heads=4, feed-forward dimension 2048 with a dropout of 0.3. In DMOL, our best config uses 14 layers, d = 256, heads=8, feed-forward dimension 512 and a dropout of 0.2. |