Sparse Communication via Mixed Distributions
Authors: António Farinhas, Wilker Aziz, Vlad Niculae, Andre Martins
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with both approaches on an emergent communication benchmark and on modeling MNIST and Fashion-MNIST data with variational auto-encoders with mixed latent variables. |
| Researcher Affiliation | Collaboration | António Farinhas 1, Wilker Aziz 2, Vlad Niculae 3, André F. T. Martins 1,4 1Instituto de Telecomunicações, Instituto Superior Técnico (Lisbon ELLIS Unit), 2ILLC, University of Amsterdam, 3Iv I, University of Amsterdam, 4Unbabel |
| Pseudocode | No | The paper describes algorithms in text (e.g., 'forward algorithm', 'backward algorithm') but does not present them as structured pseudocode or labeled algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available. Additionally, code and instructions to reproduce our experiments are available at https://github.com/deep-spin/ sparse-communication. |
| Open Datasets | Yes | Data. The dataset consists of a subset of Image Net (Deng et al., 2009)...To get the dataset visit https://github.com/Diane Bouchacourt/ Signaling Game (Bouchacourt & Baroni, 2018). We use Fashion-MNIST (Xiao et al., 2017)...We use stochastically binarized MNIST (Le Cun et al., 2010). |
| Dataset Splits | Yes | The first 55,000 instances are used for training, the next 5,000 instances for development and the remaining 10,000 for test. |
| Hardware Specification | Yes | Our infrastructure consists of 5 machines with the specifications shown in Table 5. Table 5: Computing infrastructure. 1. 4 Titan Xp 12GB 16 AMD Ryzen 1950X @ 3.40GHz 128GB 2. 4 GTX 1080 Ti 12GB 8 Intel i7-9800X @ 3.80GHz 128GB 3. 3 RTX 2080 Ti 12GB 12 AMD Ryzen 2920X @ 3.50GHz 128GB 4. 3 RTX 2080 Ti 12GB 12 AMD Ryzen 2920X @ 3.50GHz 128GB 5. 2 GTX Titan X 12GB 12 Intel Xeon E5-1650 v3 @ 3.50GHz 64 GB |
| Software Dependencies | Yes | This work was built on open-source software; we acknowledge Van Rossum & Drake (2009); Oliphant (2006); Virtanen et al. (2020); Walt et al. (2011); Pedregosa et al. (2011), and Paszke et al. (2019). |
| Experiment Setup | Yes | We choose the best hyperparameter configuration by doing a grid search on the learning rate (0.01, 0.005, 0.001)...temperature is annealed using the schedule τ = max(0.5, exp rt)...For the K-D Hard Concrete we use a scaling constant λ = 1.1 and for Gaussian Sparsemax we set Σ = I. All models were trained for 500 epochs using the Adam optimizer with a batch size of 64. |