Emergent Communication: Generalization and Overfitting in Lewis Games
Authors: Mathieu Rita, Corentin Tallec, Paul Michel, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on this decomposition, we empirically examine the evolution of these two losses during the learning process (Section 5). Unless specified, all our experiments are run on the reconstruction game defined in Section 2.1. |
| Researcher Affiliation | Collaboration | Mathieu Rita INRIA, Paris mathieu.rita@inria.fr Corentin Tallec Paul Michel Jean-Bastien Grill Deep Mind [corentint,paulmiche,jbgrill]@deepmind.com Olivier Pietquin Google Research, Brain Team pietquin@google.com Emmanuel Dupoux EHESS,ENS-PSL,CNRS,INRIA Meta AI Research emmanuel.dupoux@gmail.com Florian Strub Deep Mind fstrub@deepmind.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is based on the EGG toolkit [39] and the code is available at https: //github.com/Mathieu Rita/Population. |
| Open Datasets | Yes | We thus train our agents on a discriminative game on top of the Celeb A [57] and Image Net [69, 19] datasets while applying previous protocol. Training, validation and test sets are randomly drawn from this pool of objects (uniformly and without overlap), and are respectively composed of 4000, 1000 and 1000 elements. |
| Dataset Splits | Yes | Training, validation and test sets are randomly drawn from this pool of objects (uniformly and without overlap), and are respectively composed of 4000, 1000 and 1000 elements. |
| Hardware Specification | Yes | Each experiment runs on a single V100-32G GPU |
| Software Dependencies | No | Our models are implemented in PyTorch [64] and are optimized using Adam [42]. No specific version numbers for software are provided. |
| Experiment Setup | Yes | The agents are optimized using Adam [42] with a learning rate of 5 10 4, β1 = 0.9 and β2 = 0.999 and a batch size of 1024. For the speaker we use policy gradient [76], with a baseline computed as the average reward within the minibatch, and an entropy regularization of 0.01 to the speaker s loss [82]. In all experiments, we select the best models by early stopping. |