Learning Robust Representations via Multi-View Information Bottleneck
Authors: Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we demonstrate the effectiveness of our model against state-of-the-art baselines in both the multi-view and single-view setting. In the single-view setting, we also estimate the coordinates on the Information Plane for each of the baseline methods as well as our method to validate the theory in Section 3. |
| Researcher Affiliation | Collaboration | Marco Federici University of Amsterdam m.federici@uva.nl Anjan Dutta University of Exeter a.dutta@exeter.ac.uk Patrick Forr e University of Amsterdam p.d.forre@uva.nl Nate Kushmann Microsoft Research nkushman@microsoft.com Zeynep Akata University of Tuebingen zeynep.akata@uni-tuebingen.de |
| Pseudocode | Yes | Algorithm 1: LMIB(θ, ψ; β, B) |
| Open Source Code | Yes | Code available at https://github.com/mfederici/Multi-View-Information-Bottleneck |
| Open Datasets | Yes | Dataset. The Sketchy dataset (Sangkloy et al., 2016) consists of 12,500 images and 75,471 hand-drawn sketches of objects from 125 classes. As in Liu et al. (2017), we also include another 60,502 images from the Image Net (Deng et al., 2009) from the same classes... and Dataset. The MIR-Flickr dataset (Huiskes & Lew, 2008) consists of 1M images... and Dataset. The dataset is generated from MNIST. |
| Dataset Splits | Yes | The labeled set contains 5 different splits of train, validation and test sets of size 10K/5K/10K respectively. |
| Hardware Specification | Yes | The Titan Xp and Titan V used for this research were donated by the NVIDIA Corporation. |
| Software Dependencies | No | All the experiments have been performed using the Adam optimizer with a learning rate of 10 4 for both encoders and the estimation network. |
| Experiment Setup | Yes | To facilitate the optimization, the hyper-parameter β is slowly increased during training, starting from a small value 10 4 to its final value with an exponential schedule. and Each training iteration used batches of size B = 128. and All the experiments have been performed using the Adam optimizer with a learning rate of 10 4 for both encoders and the estimation network. |