Complement Objective Training
Authors: Hao-Yun Chen, Pei-Hsin Wang, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on multiple tasks ranging from computer vision to natural language understanding. The experimental results confirm that, compared to the conventional training with just one primary objective, training also with the complement objective further improves the performance of the state-of-the-art models across all tasks. |
| Researcher Affiliation | Collaboration | Hao-Yun Chen1, Pei-Hsin Wang1, Chun-Hao Liu1, Shih-Chieh Chang1, 2, Jia-Yu Pan3, Yu-Ting Chen3, Wei Wei3, and Da-Cheng Juan3 1Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan 2Electronic and Optoelectronic System Research Laboratories, ITRI, Hsinchu, Taiwan 3Google Research, Mountain View, CA, USA |
| Pseudocode | Yes | Algorithm 1: Training by alternating between primary and complement objectives 1 for t 1 to ntrain steps do 2 1. Update parameters by Primary Objective: 1 N PN i=1 log(ˆyig) 3 2. Update parameters by Complement Objective: 1 N PN i=1 H(ˆyi c) |
| Open Source Code | Yes | Our code is available at https://github.com/henry8527/COT. |
| Open Datasets | Yes | We consider the following datasets for experiments with image classification: CIFAR-10, CIFAR100, SVHN, Tiny Image Net and Image Net-2012. [...] IWSLT 2015 English-Vietnamese (Cettolo et al., 2015) [...] Google Commands Dataset (Warden, 2018) |
| Dataset Splits | Yes | For validation and testing, we use TED tst2012 and TED tst2013, respectively. |
| Hardware Specification | No | The paper does not specify any particular CPU or GPU models, or other hardware specifications used for running the experiments. |
| Software Dependencies | Yes | For the baseline implementation, we follow the official Tensor Flow-NMT implementation3. That is, the number of total training steps is 12,000 and the weight decay starts at the 8,000th step then applied for every 1,000 steps. (footnote 3: https://github.com/tensorflow/nmt/tree/tf-1.4) |
| Experiment Setup | Yes | Specifically, the models are trained using SGD optimizer with momentum of 0.9. Weight decay is set to be 0.0001 and learning rate starts at 0.1, then being divided by 10 at the 100th and 150th epoch. The models are trained for 200 epochs, with mini-batches of size 128. |