reproducibilityindex.ai

Colorization Transformer

Authors: Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Col Tran on colorizing 256 256 grayscale images from the Image Net dataset (Russakovsky et al., 2015). We train the Col Tran core, color and spatial upsamplers independently on 16 TPUv2 chips with a batch-size of 224, 768 and 32 for 600K, 450K and 300K steps respectively. We compute FID using colorizations of 5000 grayscale images of resolution 256 256 from the Image Net validation set.
Researcher Affiliation	Industry	Manoj Kumar, Dirk Weissenborn & Nal Kalchbrenner Google Research, Brain Team {mechcoder,diwe,nalk}@google.com
Pseudocode	No	The paper describes the model architecture and processes using text and mathematical equations but does not contain structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode').
Open Source Code	Yes	Our implementation is open-sourced in the google-research framework at https://github.com/googleresearch/google-research/tree/master/coltran with a zip compressed version here.
Open Datasets	Yes	We evaluate Col Tran on colorizing 256 256 grayscale images from the Image Net dataset (Russakovsky et al., 2015).
Dataset Splits	Yes	We set apart 10000 images from the training set as a holdout set to tune hyperparameters and perform ablations.
Hardware Specification	Yes	We train the Col Tran core, color and spatial upsamplers independently on 16 TPUv2 chips with a batch-size of 224, 768 and 32 for 600K, 450K and 300K steps respectively. Col Tran core can sample a batch of 20 64x64 grayscale images in around 3.5 -5 minutes on a P100 GPU vs Pix Color that takes 10 minutes to colorize 28x28 grayscale images on a K40 GPU.
Software Dependencies	No	The paper mentions using RMSprop as an optimizer but does not provide specific version numbers for software components, libraries, or frameworks used in the implementation.
Experiment Setup	Yes	We train the Col Tran core, color and spatial upsamplers independently on 16 TPUv2 chips with a batch-size of 224, 768 and 32 for 600K, 450K and 300K steps respectively. We use 4 axial attention blocks in each component of our architecture, with a hidden size of 512 and 4 heads. We use RMSprop (Tieleman & Hinton, 2012) with a ﬁxed learning rate of 3e 4.