Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Emergent Communication at Scale
Authors: Rahma Chaabouni, Florian Strub, Florent Altché, Eugene Tarassov, Corentin Tallec, Elnaz Davoodi, Kory Wallace Mathewson, Olivier Tieleman, Angeliki Lazaridou, Bilal Piot
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Overall, our experiments provide a large spectrum of observations, both positive and negative. |
| Researcher Affiliation | Industry | Contributed equally. Corresponding authors: EMAIL |
| Pseudocode | Yes | Listing 2: Face reconstruction Head |
| Open Source Code | Yes | 1 Source code: github.com/deepmind/emergent_communication_at_scale |
| Open Datasets | Yes | We use the Image Net (Deng et al., 2009; Russakovsky et al., 2015), and Celeb A datasets (Liu et al., 2015), which respectively contain 1400k and 200k labelled images. |
| Dataset Splits | Yes | In our experiments, we use 99% of the official train set for training, i.e., 1300k images, the last 1% of the train set for validation, i.e., 13k images, and the official validation set as our test set (i.e., 50k images). |
| Hardware Specification | Yes | Table 5: Computational requirements for our base setup. GPU memory refers to the peak GPU memory usage. Device: p100, v100 |
| Software Dependencies | No | The paper mentions 'Jaxline pipeline (Babuschkin et al., 2020)' and 'Adam optimisers (Kingma & Ba, 2015)' as software used. However, it does not specify version numbers for these or any other software components or libraries. |
| Experiment Setup | Yes | Table 3: Hyper-parameters values across datasets and settings. Learning rate lr 0.0001 Batch training size |X| 1024 Number of Candidates |C| 1024 Number of agent sampled P min(N, 10) KL coefficient β 0.5 KL EMA η 0.99 Entropy Coefficient α 0.0001 Vocabulary size |W| 20 Message Length T 10 Imitation EMA µ 0.99 |