TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

Authors: Zhaoyan Liu, Noël Vouitsis, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental TR0N requires no training data nor finetuning, yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed all while retaining a much higher level of generality. We compare TR0N against Fuse Dream on the MS-COCO dataset, which contains text/image pairs. For each text, we generate a corresponding image with both methods, and then compute both the FID and augmented CLIP score. Results are displayed in Figure 5 for various computational budgets. Table 1 also includes some ablations: (i) removing the error correction (Langevin dynamics) step altogether, which results in heavily degraded FID and IS for the NVAE-based model, and much worse conditioning for both models
Researcher Affiliation Collaboration 1Layer 6 AI, Toronto, Canada 2University of Toronto, Toronto, Canada 3Vector Institute, Toronto, Canada.
Pseudocode Yes Algorithm 1 TR0N training and Algorithm 2 TR0N sampling on page 4.
Open Source Code Yes Our code is available at https: //github.com/layer6ai-labs/tr0n.
Open Datasets Yes We demonstrate TR0N s ability to make an unconditional model on CIFAR-10 (Krizhevsky, 2009) into a classconditional one. We compare TR0N against Fuse Dream on the MS-COCO dataset, which contains text/image pairs. both pre-trained on FFHQ (Karras et al., 2019).
Dataset Splits Yes We note that to compute all FID scores in Figure 5, we use the entire validation set of MS-COCO, which contains 40k text/image pairs. We demonstrate TR0N s ability to make an unconditional model on CIFAR-10 (Krizhevsky, 2009) into a classconditional one.
Hardware Specification Yes The experiments that required timing were all run on an NVIDIA TITAN RTX.
Software Dependencies No All other translator weights are randomly initialized using the default Py Torch (Paszke et al., 2019) linear layer initializer. The paper mentions PyTorch but does not provide specific version numbers for software dependencies.
Experiment Setup Yes We train the translator for 10 epochs on said synthetic dataset with a batch size B = 16. We thus use ADAM to optimize the translator network with a learning rate of 10 4 and a cosine scheduler to anneal the learning rate to 0 throughout training. We set the momentum to 0.99 and add noise with λ = 10 4 in all experiments. Unless otherwise stated, we use T = 100 steps of Langevin dynamics.