Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Emergent Communication at Scale

Authors: Rahma Chaabouni, Florian Strub, Florent Altché, Eugene Tarassov, Corentin Tallec, Elnaz Davoodi, Kory Wallace Mathewson, Olivier Tieleman, Angeliki Lazaridou, Bilal Piot

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Overall, our experiments provide a large spectrum of observations, both positive and negative.
Researcher Affiliation Industry Contributed equally. Corresponding authors: EMAIL
Pseudocode Yes Listing 2: Face reconstruction Head
Open Source Code Yes 1 Source code: github.com/deepmind/emergent_communication_at_scale
Open Datasets Yes We use the Image Net (Deng et al., 2009; Russakovsky et al., 2015), and Celeb A datasets (Liu et al., 2015), which respectively contain 1400k and 200k labelled images.
Dataset Splits Yes In our experiments, we use 99% of the official train set for training, i.e., 1300k images, the last 1% of the train set for validation, i.e., 13k images, and the official validation set as our test set (i.e., 50k images).
Hardware Specification Yes Table 5: Computational requirements for our base setup. GPU memory refers to the peak GPU memory usage. Device: p100, v100
Software Dependencies No The paper mentions 'Jaxline pipeline (Babuschkin et al., 2020)' and 'Adam optimisers (Kingma & Ba, 2015)' as software used. However, it does not specify version numbers for these or any other software components or libraries.
Experiment Setup Yes Table 3: Hyper-parameters values across datasets and settings. Learning rate lr 0.0001 Batch training size |X| 1024 Number of Candidates |C| 1024 Number of agent sampled P min(N, 10) KL coefficient β 0.5 KL EMA η 0.99 Entropy Coefficient α 0.0001 Vocabulary size |W| 20 Message Length T 10 Imitation EMA µ 0.99