Emergent Communication: Generalization and Overfitting in Lewis Games

Authors: Mathieu Rita, Corentin Tallec, Paul Michel, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Based on this decomposition, we empirically examine the evolution of these two losses during the learning process (Section 5). Unless specified, all our experiments are run on the reconstruction game defined in Section 2.1.
Researcher Affiliation Collaboration Mathieu Rita INRIA, Paris mathieu.rita@inria.fr Corentin Tallec Paul Michel Jean-Bastien Grill Deep Mind [corentint,paulmiche,jbgrill]@deepmind.com Olivier Pietquin Google Research, Brain Team pietquin@google.com Emmanuel Dupoux EHESS,ENS-PSL,CNRS,INRIA Meta AI Research emmanuel.dupoux@gmail.com Florian Strub Deep Mind fstrub@deepmind.com
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is based on the EGG toolkit [39] and the code is available at https: //github.com/Mathieu Rita/Population.
Open Datasets Yes We thus train our agents on a discriminative game on top of the Celeb A [57] and Image Net [69, 19] datasets while applying previous protocol. Training, validation and test sets are randomly drawn from this pool of objects (uniformly and without overlap), and are respectively composed of 4000, 1000 and 1000 elements.
Dataset Splits Yes Training, validation and test sets are randomly drawn from this pool of objects (uniformly and without overlap), and are respectively composed of 4000, 1000 and 1000 elements.
Hardware Specification Yes Each experiment runs on a single V100-32G GPU
Software Dependencies No Our models are implemented in PyTorch [64] and are optimized using Adam [42]. No specific version numbers for software are provided.
Experiment Setup Yes The agents are optimized using Adam [42] with a learning rate of 5 10 4, β1 = 0.9 and β2 = 0.999 and a batch size of 1024. For the speaker we use policy gradient [76], with a baseline computed as the average reward within the minibatch, and an entropy regularization of 0.01 to the speaker s loss [82]. In all experiments, we select the best models by early stopping.