Emergent Communication for Rules Reasoning
Authors: Yuxuan Guo, Yifan Hao, Rui Zhang, Enshuai Zhou, Zidong Du, xishan zhang, Xinkai Song, Yuanbo Wen, Yongwei Zhao, Xuehai Zhou, Jiaming Guo, Qi Yi, Shaohui Peng, Di Huang, Ruizhi Chen, Qi Guo, Yunji Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that, in the Reasoning Game, a semantically stable and compositional language emerges to solve reasoning problems. The emerged language helps agents apply the extracted rules to the generalization of unseen context attributes, and to the transfer between different context attributes or even tasks. ... 4 Exprimental Settings ... 5 Exprimental Results |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China 2State Key Lab of Processors, Institute of Computing Technology, CAS 3Cambricon Technologies 4University of Chinese Academy of Sciences 5Shanghai Innovation Center for Processor Technologies 6Intelligent Software Research Center, Institute of Software, CAS |
| Pseudocode | Yes | Algorithm 1: Rule-based candidate panels generation algorithm |
| Open Source Code | Yes | Implementation. We use Python3 [42] to implement the rule-RAVEN dataset. Our model implementation is based on Pytorch [33] and EGG [20] toolkit. The code is available in supplementary materials. |
| Open Datasets | Yes | For each rule combination, We randomly generated 20 different reasoning problem cases (sampling from N 4 attribute values) with Algorithm 1 (4096 × 20 = 81920 in total), and half of these cases for training, half for testing. Besides, we use a held-out rule set to additionally generate 2000 cases for pre-training the speaker in the first stage of agent training (introduced in Sec 3.3). |
| Dataset Splits | Yes | half of these cases for training, half for testing. ... To test the generalization ability of the emerged language, we further create 4 data splits, corresponding to 4 levels of generalization ability: 1) In-distribution generalization (ID). The training and test sets share the same rule combinations but consist of different problems. ... For each of the remaining 74 − 300 = 2101 rule combinations, we generate 10 problems as the train data split (for ID, Inpo-ood, and Expo-ood-L2) and 10 problems as the ID data split (with a total of 2101 × 10 = 21010 problems for training and 21010 for ID). |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments. Table 4 lists model hyperparameters, not hardware specifications. |
| Software Dependencies | No | The paper mentions "Python3 [42]", "Pytorch [33]", and "EGG [20]". While "Python3" indicates a version, PyTorch and EGG are mentioned without specific version numbers, and Python3 alone is not sufficient to meet the criteria for multiple versioned software components. |
| Experiment Setup | Yes | Optimization. Agents parameters are optimized by Adam W [30], with a learning rage of 3 × 10−3, a weight decay of 0.01, β1 = 0.99, β2 = 0.999 and a batch size of 512. For the speaker, we set the hyperparameter of entropy regularization λ = 0.01. |