Emergent Communication for Numerical Concepts Generalization
Authors: Enshuai Zhou, Yifan Hao, Rui Zhang, Yuxuan Guo, Zidong Du, Xishan Zhang, Xinkai Song, Chao Wang, Xuehai Zhou, Jiaming Guo, Qi Yi, Shaohui Peng, Di Huang, Ruizhi Chen, Qi Guo, Yunji Chen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results indicate the impressive generalization capabilities to unseen quantities and regularity of the language emergence from communication. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China 2State Key Lab of Processors, Institute of Computing Technology, CAS 3Cambricon Technologies 4University of Chinese Academy of Sciences 5Shanghai Innovation Center for Processor Technologies 6Intelligent Software Research Center, Institute of Software, CAS |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a link or explicit statement about the availability of its source code for the described methodology. |
| Open Datasets | Yes | The Num World Dataset is developed based on the Shape World dataset (Kuhnle and Copestake 2017), which serves as a synthetic dataset for visual reasoning. |
| Dataset Splits | Yes | Each sub-dataset contains both a training set and a validation set, which is used to select the best model for the next training stage or evaluate the model s generalization performance. Table 1 shows the detailed statistics of the three sub-datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | Our model implementation and training process are based on the Py Torch (Paszke et al. 2019) framework and partially adapted from the EGG (Kharitonov et al. 2019) toolkit. The paper mentions software frameworks but does not provide specific version numbers for them. |
| Experiment Setup | Yes | Regarding the discrete communication channel connecting the speaker and listener, we set the maximum message length |M| = 3 and the vocabulary size |V | = 16. The speaker and listener are trained with the Adam W optimizer (Loshchilov and Hutter 2018). The learning rates vary across different training stages, and simultaneously, different sub-modules within the model also have distinct learning rates. The temperature τ in the Gumbel-Softmax also decays from 2.0 to 0.1 with a rate of 0.9. |