Complement Objective Training

Authors: Hao-Yun Chen, Pei-Hsin Wang, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on multiple tasks ranging from computer vision to natural language understanding. The experimental results confirm that, compared to the conventional training with just one primary objective, training also with the complement objective further improves the performance of the state-of-the-art models across all tasks.
Researcher Affiliation Collaboration Hao-Yun Chen1, Pei-Hsin Wang1, Chun-Hao Liu1, Shih-Chieh Chang1, 2, Jia-Yu Pan3, Yu-Ting Chen3, Wei Wei3, and Da-Cheng Juan3 1Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan 2Electronic and Optoelectronic System Research Laboratories, ITRI, Hsinchu, Taiwan 3Google Research, Mountain View, CA, USA
Pseudocode Yes Algorithm 1: Training by alternating between primary and complement objectives 1 for t 1 to ntrain steps do 2 1. Update parameters by Primary Objective: 1 N PN i=1 log(ˆyig) 3 2. Update parameters by Complement Objective: 1 N PN i=1 H(ˆyi c)
Open Source Code Yes Our code is available at https://github.com/henry8527/COT.
Open Datasets Yes We consider the following datasets for experiments with image classification: CIFAR-10, CIFAR100, SVHN, Tiny Image Net and Image Net-2012. [...] IWSLT 2015 English-Vietnamese (Cettolo et al., 2015) [...] Google Commands Dataset (Warden, 2018)
Dataset Splits Yes For validation and testing, we use TED tst2012 and TED tst2013, respectively.
Hardware Specification No The paper does not specify any particular CPU or GPU models, or other hardware specifications used for running the experiments.
Software Dependencies Yes For the baseline implementation, we follow the official Tensor Flow-NMT implementation3. That is, the number of total training steps is 12,000 and the weight decay starts at the 8,000th step then applied for every 1,000 steps. (footnote 3: https://github.com/tensorflow/nmt/tree/tf-1.4)
Experiment Setup Yes Specifically, the models are trained using SGD optimizer with momentum of 0.9. Weight decay is set to be 0.0001 and learning rate starts at 0.1, then being divided by 10 at the 100th and 150th epoch. The models are trained for 200 epochs, with mini-batches of size 128.