reproducibilityindex.ai

Image Transformer

Authors: Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	While conceptually simple, our generative models signiﬁcantly outperform the current state of the art in image generation on Image Net, improving the best published negative log-likelihood on Image Net from 3.83 to 3.77. We also present results on image super-resolution with a large magniﬁcation ratio, applying an encoder-decoder conﬁguration of our architecture. In a human evaluation study, we ﬁnd that images generated by our super-resolution model fool human observers three times more often than the previous state of the art.
Researcher Affiliation	Collaboration	1Google Brain, Mountain View, USA 2Department of Electrical Engineering and Computer Sciences, University of California, Berkeley 3Work done during an internship at Google Brain 4Google AI, Mountain View, USA.
Pseudocode	No	The paper includes a diagram (Figure 1) illustrating a slice of the Image Transformer and equations describing operations, but it does not contain a dedicated pseudocode or algorithm block.
Open Source Code	Yes	All code we used to develop, train, and evaluate our models is available in Tensor2Tensor (Vaswani et al., 2018).
Open Datasets	Yes	modeling images from the standard Image Net data set, as measured by log-likelihood.
Dataset Splits	Yes	Table 4. Bits/dim on CIFAR-10 test and Image Net validation sets. The Image Transformer outperforms all models and matches Pixel CNN++, achieving a new state-of-the-art on Image Net.
Hardware Specification	Yes	We train our models on both p100 and k40 GPUs, with batch sizes ranging from 1 to 8 per GPU.
Software Dependencies	No	The paper mentions 'TensorFlow' for image resizing and 'Tensor2Tensor' as the framework where the code is available, but it does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	For categorical, we use 12 layers with d = 512, heads=4, feed-forward dimension 2048 with a dropout of 0.3. In DMOL, our best conﬁg uses 14 layers, d = 256, heads=8, feed-forward dimension 512 and a dropout of 0.2.