reproducibilityindex.ai

Complement Objective Training

Authors: Hao-Yun Chen, Pei-Hsin Wang, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on multiple tasks ranging from computer vision to natural language understanding. The experimental results conﬁrm that, compared to the conventional training with just one primary objective, training also with the complement objective further improves the performance of the state-of-the-art models across all tasks.
Researcher Affiliation	Collaboration	Hao-Yun Chen1, Pei-Hsin Wang1, Chun-Hao Liu1, Shih-Chieh Chang1, 2, Jia-Yu Pan3, Yu-Ting Chen3, Wei Wei3, and Da-Cheng Juan3 1Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan 2Electronic and Optoelectronic System Research Laboratories, ITRI, Hsinchu, Taiwan 3Google Research, Mountain View, CA, USA
Pseudocode	Yes	Algorithm 1: Training by alternating between primary and complement objectives 1 for t 1 to ntrain steps do 2 1. Update parameters by Primary Objective: 1 N PN i=1 log(ˆyig) 3 2. Update parameters by Complement Objective: 1 N PN i=1 H(ˆyi c)
Open Source Code	Yes	Our code is available at https://github.com/henry8527/COT.
Open Datasets	Yes	We consider the following datasets for experiments with image classiﬁcation: CIFAR-10, CIFAR100, SVHN, Tiny Image Net and Image Net-2012. [...] IWSLT 2015 English-Vietnamese (Cettolo et al., 2015) [...] Google Commands Dataset (Warden, 2018)
Dataset Splits	Yes	For validation and testing, we use TED tst2012 and TED tst2013, respectively.
Hardware Specification	No	The paper does not specify any particular CPU or GPU models, or other hardware specifications used for running the experiments.
Software Dependencies	Yes	For the baseline implementation, we follow the ofﬁcial Tensor Flow-NMT implementation3. That is, the number of total training steps is 12,000 and the weight decay starts at the 8,000th step then applied for every 1,000 steps. (footnote 3: https://github.com/tensorflow/nmt/tree/tf-1.4)
Experiment Setup	Yes	Speciﬁcally, the models are trained using SGD optimizer with momentum of 0.9. Weight decay is set to be 0.0001 and learning rate starts at 0.1, then being divided by 10 at the 100th and 150th epoch. The models are trained for 200 epochs, with mini-batches of size 128.