Wasserstein Barycenter Model Ensembling

Authors: Pierre Dognin*, Igor Melnyk*, Youssef Mroueh*, Jarret Ross*, Cicero Dos Santos*, Tom Sercu*

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling. In this Section we evaluate W. barycenter ensembling in the problems of attribute-based classification, multi-label prediction and in natural language generation in image captioning.
Researcher Affiliation Collaboration Pierre Dognin , Igor Melnyk ,Youssef Mroueh , Jerret Ross , Cicero Dos Santos & Tom Sercu IBM Research & MIT-IBM Watson AI Lab Alphabetical order; Equal contribution {pdognin,mroueh,rossja,cicerons}@us.ibm.com, {igor.melnyk,tom.sercu1}@ibm.com
Pseudocode Yes Algorithm 1: Balanced Barycenter for Multiclass Ensembling (Benamou et al., 2015) and Algorithm 2: Unbalanced Barycenter for Multilabel Ensembling (Chizat et al., 2018)
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We use Animals with Attributes (Xian et al., 2017) which has 85 attributes and 50 classes. we use MS-COCO (Lin et al., 2014) with 80 objects categories. The training was done on COCO dataset (Lin et al., 2014) using data splits from (Karpathy & Li, 2015a).
Dataset Splits Yes We split the data randomly in 30322 / 3500 / 3500 images for train / validation / test respectively. MS-COCO is split into training ( 82K images), test ( 35K), and validation (5K) sets, following the Karpathy splits used in the community (Karpathy & Li, 2015b). The training was done on COCO dataset (Lin et al., 2014) using data splits from (Karpathy & Li, 2015a): training set of 113k images with 5 captions each, 5k validation set, and 5k test set.
Hardware Specification Yes We report timing numbers over two GPU architectures, NVIDIA Tesla K80 and V100.
Software Dependencies No The paper mentions software like 'pytorch' and optimizers like 'ADAM' (Kingma & Ba, 2015), but it does not specify version numbers for these or other software components used for reproducibility.
Experiment Setup Yes We selected the hyperparameters ε = 0.3 and λ = 2 on the validation split and report here the accuracies on the test split. Training of the fc layer uses a 10 3 learning rate, while all fine-tunings use 10 6 learning rate. All multi-label trainings use ADAM (Kingma & Ba, 2015) with (β1 = 0.9, β2 = 0.999) for learning rate management and are stopped at 40 epochs. The model prediction µℓ, for ℓ= 1, . . . , 5 was selected as the softmax output of the captioner s LSTM at the current time step, and each model s input was weighted equally: λℓ= 1/m. Once the barycenter p was computed, the result was fed into a beam search (beam size B = 5), whose output, in turn, was then given to the captioner s LSTM and the process continued until a stop symbol (EOS) was generated.