You Only Train Once: Loss-Conditional Training of Deep Networks

Authors: Alexey Dosovitskiy, Josip Djolonga

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed method both quantitatively and qualitatively on three problems with multiterm loss functions: β-VAE, learned image compression, and fast style transfer.4 EXPERIMENTS
Researcher Affiliation Industry Alexey Dosovitskiy & Josip Djolonga Google Research, Brain Team {adosovitskiy, josipd}@google.com
Pseudocode No The paper describes the method in prose and equations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The code will be released at www.github.com/google-research/google-research/yoto.
Open Datasets Yes We consider two settings: the CIFAR-10 dataset (Krizhevsky, 2009) with Gaussian outputs, and the Shapes3D dataset (Burgess & Kim, 2018) with Bernoulli outputs. We evaluate the compression models on two datasets: Kodak (Kodak, 1993) and Tecnick (Asuni & Giachetti, 2014). We sample the content images form Image Net (Deng et al., 2009) and use 14 pointillism paintings as the style images.
Dataset Splits Yes We select the fixed β so that it minimizes the average validation loss over all β values. Figure 7: Qualitative comparison of image stylization models on an image from the validation set of Image Net.
Hardware Specification No The paper mentions "on a single CPU core" in the context of timing a specific comparison, but it does not provide specific hardware details (like GPU models or CPU models with speeds) for the main experiments.
Software Dependencies No The paper mentions non-linearities and optimization techniques but does not provide specific version numbers for software dependencies or libraries used (e.g., PyTorch, TensorFlow, Python versions).
Experiment Setup Yes On Shapes3D we train all models for a total of 600,000 mini-batch iterations, and we multiply the learning rate by 0.5 after 300,000, 390,000, 480,000, and 570,000 iterations. We tuned the learning rates by sweeping over the values {5·10-5, 1·10-4, 2·10-4, 4·10-4, 8·10-4} and ended up using the learning rates 1·10-4 on CIFAR-10 and 2·10-4 on Shapes3D. We use mini-batches of 128 samples on CIFAR-10 and 64 samples on Shapes3D. We use weight decay of 10-5 in all models.