Learning Robust Representations via Multi-View Information Bottleneck

Authors: Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we demonstrate the effectiveness of our model against state-of-the-art baselines in both the multi-view and single-view setting. In the single-view setting, we also estimate the coordinates on the Information Plane for each of the baseline methods as well as our method to validate the theory in Section 3.
Researcher Affiliation Collaboration Marco Federici University of Amsterdam m.federici@uva.nl Anjan Dutta University of Exeter a.dutta@exeter.ac.uk Patrick Forr e University of Amsterdam p.d.forre@uva.nl Nate Kushmann Microsoft Research nkushman@microsoft.com Zeynep Akata University of Tuebingen zeynep.akata@uni-tuebingen.de
Pseudocode Yes Algorithm 1: LMIB(θ, ψ; β, B)
Open Source Code Yes Code available at https://github.com/mfederici/Multi-View-Information-Bottleneck
Open Datasets Yes Dataset. The Sketchy dataset (Sangkloy et al., 2016) consists of 12,500 images and 75,471 hand-drawn sketches of objects from 125 classes. As in Liu et al. (2017), we also include another 60,502 images from the Image Net (Deng et al., 2009) from the same classes... and Dataset. The MIR-Flickr dataset (Huiskes & Lew, 2008) consists of 1M images... and Dataset. The dataset is generated from MNIST.
Dataset Splits Yes The labeled set contains 5 different splits of train, validation and test sets of size 10K/5K/10K respectively.
Hardware Specification Yes The Titan Xp and Titan V used for this research were donated by the NVIDIA Corporation.
Software Dependencies No All the experiments have been performed using the Adam optimizer with a learning rate of 10 4 for both encoders and the estimation network.
Experiment Setup Yes To facilitate the optimization, the hyper-parameter β is slowly increased during training, starting from a small value 10 4 to its final value with an exponential schedule. and Each training iteration used batches of size B = 128. and All the experiments have been performed using the Adam optimizer with a learning rate of 10 4 for both encoders and the estimation network.