Colorization Transformer
Authors: Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Col Tran on colorizing 256 256 grayscale images from the Image Net dataset (Russakovsky et al., 2015). We train the Col Tran core, color and spatial upsamplers independently on 16 TPUv2 chips with a batch-size of 224, 768 and 32 for 600K, 450K and 300K steps respectively. We compute FID using colorizations of 5000 grayscale images of resolution 256 256 from the Image Net validation set. |
| Researcher Affiliation | Industry | Manoj Kumar, Dirk Weissenborn & Nal Kalchbrenner Google Research, Brain Team {mechcoder,diwe,nalk}@google.com |
| Pseudocode | No | The paper describes the model architecture and processes using text and mathematical equations but does not contain structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode'). |
| Open Source Code | Yes | Our implementation is open-sourced in the google-research framework at https://github.com/googleresearch/google-research/tree/master/coltran with a zip compressed version here. |
| Open Datasets | Yes | We evaluate Col Tran on colorizing 256 256 grayscale images from the Image Net dataset (Russakovsky et al., 2015). |
| Dataset Splits | Yes | We set apart 10000 images from the training set as a holdout set to tune hyperparameters and perform ablations. |
| Hardware Specification | Yes | We train the Col Tran core, color and spatial upsamplers independently on 16 TPUv2 chips with a batch-size of 224, 768 and 32 for 600K, 450K and 300K steps respectively. Col Tran core can sample a batch of 20 64x64 grayscale images in around 3.5 -5 minutes on a P100 GPU vs Pix Color that takes 10 minutes to colorize 28x28 grayscale images on a K40 GPU. |
| Software Dependencies | No | The paper mentions using RMSprop as an optimizer but does not provide specific version numbers for software components, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | We train the Col Tran core, color and spatial upsamplers independently on 16 TPUv2 chips with a batch-size of 224, 768 and 32 for 600K, 450K and 300K steps respectively. We use 4 axial attention blocks in each component of our architecture, with a hidden size of 512 and 4 heads. We use RMSprop (Tieleman & Hinton, 2012) with a fixed learning rate of 3e 4. |