You Only Train Once: Loss-Conditional Training of Deep Networks
Authors: Alexey Dosovitskiy, Josip Djolonga
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed method both quantitatively and qualitatively on three problems with multiterm loss functions: β-VAE, learned image compression, and fast style transfer.4 EXPERIMENTS |
| Researcher Affiliation | Industry | Alexey Dosovitskiy & Josip Djolonga Google Research, Brain Team {adosovitskiy, josipd}@google.com |
| Pseudocode | No | The paper describes the method in prose and equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code will be released at www.github.com/google-research/google-research/yoto. |
| Open Datasets | Yes | We consider two settings: the CIFAR-10 dataset (Krizhevsky, 2009) with Gaussian outputs, and the Shapes3D dataset (Burgess & Kim, 2018) with Bernoulli outputs. We evaluate the compression models on two datasets: Kodak (Kodak, 1993) and Tecnick (Asuni & Giachetti, 2014). We sample the content images form Image Net (Deng et al., 2009) and use 14 pointillism paintings as the style images. |
| Dataset Splits | Yes | We select the fixed β so that it minimizes the average validation loss over all β values. Figure 7: Qualitative comparison of image stylization models on an image from the validation set of Image Net. |
| Hardware Specification | No | The paper mentions "on a single CPU core" in the context of timing a specific comparison, but it does not provide specific hardware details (like GPU models or CPU models with speeds) for the main experiments. |
| Software Dependencies | No | The paper mentions non-linearities and optimization techniques but does not provide specific version numbers for software dependencies or libraries used (e.g., PyTorch, TensorFlow, Python versions). |
| Experiment Setup | Yes | On Shapes3D we train all models for a total of 600,000 mini-batch iterations, and we multiply the learning rate by 0.5 after 300,000, 390,000, 480,000, and 570,000 iterations. We tuned the learning rates by sweeping over the values {5·10-5, 1·10-4, 2·10-4, 4·10-4, 8·10-4} and ended up using the learning rates 1·10-4 on CIFAR-10 and 2·10-4 on Shapes3D. We use mini-batches of 128 samples on CIFAR-10 and 64 samples on Shapes3D. We use weight decay of 10-5 in all models. |