Sparse Communication via Mixed Distributions

Authors: António Farinhas, Wilker Aziz, Vlad Niculae, Andre Martins

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment with both approaches on an emergent communication benchmark and on modeling MNIST and Fashion-MNIST data with variational auto-encoders with mixed latent variables.
Researcher Affiliation Collaboration António Farinhas 1, Wilker Aziz 2, Vlad Niculae 3, André F. T. Martins 1,4 1Instituto de Telecomunicações, Instituto Superior Técnico (Lisbon ELLIS Unit), 2ILLC, University of Amsterdam, 3Iv I, University of Amsterdam, 4Unbabel
Pseudocode No The paper describes algorithms in text (e.g., 'forward algorithm', 'backward algorithm') but does not present them as structured pseudocode or labeled algorithm blocks.
Open Source Code Yes Our code is publicly available. Additionally, code and instructions to reproduce our experiments are available at https://github.com/deep-spin/ sparse-communication.
Open Datasets Yes Data. The dataset consists of a subset of Image Net (Deng et al., 2009)...To get the dataset visit https://github.com/Diane Bouchacourt/ Signaling Game (Bouchacourt & Baroni, 2018). We use Fashion-MNIST (Xiao et al., 2017)...We use stochastically binarized MNIST (Le Cun et al., 2010).
Dataset Splits Yes The first 55,000 instances are used for training, the next 5,000 instances for development and the remaining 10,000 for test.
Hardware Specification Yes Our infrastructure consists of 5 machines with the specifications shown in Table 5. Table 5: Computing infrastructure. 1. 4 Titan Xp 12GB 16 AMD Ryzen 1950X @ 3.40GHz 128GB 2. 4 GTX 1080 Ti 12GB 8 Intel i7-9800X @ 3.80GHz 128GB 3. 3 RTX 2080 Ti 12GB 12 AMD Ryzen 2920X @ 3.50GHz 128GB 4. 3 RTX 2080 Ti 12GB 12 AMD Ryzen 2920X @ 3.50GHz 128GB 5. 2 GTX Titan X 12GB 12 Intel Xeon E5-1650 v3 @ 3.50GHz 64 GB
Software Dependencies Yes This work was built on open-source software; we acknowledge Van Rossum & Drake (2009); Oliphant (2006); Virtanen et al. (2020); Walt et al. (2011); Pedregosa et al. (2011), and Paszke et al. (2019).
Experiment Setup Yes We choose the best hyperparameter configuration by doing a grid search on the learning rate (0.01, 0.005, 0.001)...temperature is annealed using the schedule τ = max(0.5, exp rt)...For the K-D Hard Concrete we use a scaling constant λ = 1.1 and for Gaussian Sparsemax we set Σ = I. All models were trained for 500 epochs using the Adam optimizer with a batch size of 64.