Wasserstein Barycenter Model Ensembling
Authors: Pierre Dognin*, Igor Melnyk*, Youssef Mroueh*, Jarret Ross*, Cicero Dos Santos*, Tom Sercu*
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling. In this Section we evaluate W. barycenter ensembling in the problems of attribute-based classification, multi-label prediction and in natural language generation in image captioning. |
| Researcher Affiliation | Collaboration | Pierre Dognin , Igor Melnyk ,Youssef Mroueh , Jerret Ross , Cicero Dos Santos & Tom Sercu IBM Research & MIT-IBM Watson AI Lab Alphabetical order; Equal contribution {pdognin,mroueh,rossja,cicerons}@us.ibm.com, {igor.melnyk,tom.sercu1}@ibm.com |
| Pseudocode | Yes | Algorithm 1: Balanced Barycenter for Multiclass Ensembling (Benamou et al., 2015) and Algorithm 2: Unbalanced Barycenter for Multilabel Ensembling (Chizat et al., 2018) |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We use Animals with Attributes (Xian et al., 2017) which has 85 attributes and 50 classes. we use MS-COCO (Lin et al., 2014) with 80 objects categories. The training was done on COCO dataset (Lin et al., 2014) using data splits from (Karpathy & Li, 2015a). |
| Dataset Splits | Yes | We split the data randomly in 30322 / 3500 / 3500 images for train / validation / test respectively. MS-COCO is split into training ( 82K images), test ( 35K), and validation (5K) sets, following the Karpathy splits used in the community (Karpathy & Li, 2015b). The training was done on COCO dataset (Lin et al., 2014) using data splits from (Karpathy & Li, 2015a): training set of 113k images with 5 captions each, 5k validation set, and 5k test set. |
| Hardware Specification | Yes | We report timing numbers over two GPU architectures, NVIDIA Tesla K80 and V100. |
| Software Dependencies | No | The paper mentions software like 'pytorch' and optimizers like 'ADAM' (Kingma & Ba, 2015), but it does not specify version numbers for these or other software components used for reproducibility. |
| Experiment Setup | Yes | We selected the hyperparameters ε = 0.3 and λ = 2 on the validation split and report here the accuracies on the test split. Training of the fc layer uses a 10 3 learning rate, while all fine-tunings use 10 6 learning rate. All multi-label trainings use ADAM (Kingma & Ba, 2015) with (β1 = 0.9, β2 = 0.999) for learning rate management and are stopped at 40 epochs. The model prediction µℓ, for ℓ= 1, . . . , 5 was selected as the softmax output of the captioner s LSTM at the current time step, and each model s input was weighted equally: λℓ= 1/m. Once the barycenter p was computed, the result was fed into a beam search (beam size B = 5), whose output, in turn, was then given to the captioner s LSTM and the process continued until a stop symbol (EOS) was generated. |