reproducibilityindex.ai

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

Authors: Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, Jun Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the superiority of our approach over existing techniques in terms of length generalization, efficiency, and interpretability for symbolic operations. Furthermore, it can be applied to LMs across different model scales, outperforming tool-calling methods in arithmetic reasoning tasks while maintaining superior inference efficiency. Our work highlights the potential of seamlessly unifying explicit rule learning via Co NNs and implicit pattern learning in LMs, paving the way for true symbolic comprehension capabilities.
Researcher Affiliation	Academia	1 The Laboratory of Cognition and Decision Intelligence for Complex Systems, IA, CAS 2 School of Artificial Intelligence, University of Chinese Academy of Sciences 3 College of Electrical and Information Engineering, Hunan University
Pseudocode	Yes	THE TRACR CODE OF PARITY CONN def parity(sop) -> rasp.SOp: """Multiply the length of each token.""" sop = rasp.Sequence Map(lambda x,y: x * y,sop,length).named( map_length ) """Add each bit.""" out = rasp.numerical(rasp.Aggregate(rasp.Select(rasp.indices,rasp.indices,rasp.Comparison. TRUE).named( Select ),rasp.numerical(rasp.Map(lambda x: x, sop).named( map_length )), default=0).named( Aggregate )) """Calculate whether the remainder of dividing it by 2 is odd or even.""" out = rasp.Map(lambda x: 0 if x % 2 == 0 else 1,out).named( Zipmap )
Open Source Code	Yes	The code is released at: https://github.com/wengsyx/Neural-Comprehension.
Open Datasets	Yes	GSM8K: https://github.com/openai/grade-school-math; Single Eq: https://gitlab.cs.washington.edu/ALGES/TACL2015; Add Sub: https://www.cs.washington.edu/nlp/arithmetic; Multi Arith: http://cogcomp.cs.illinois.edu/page/resource_view/ 98; SVAMP: https://github.com/arkilpatel/SVAMP
Dataset Splits	Yes	Our experimental design encompasses 1000 40 independent test sets, comprising problems with varying digit lengths from 1 to 40 digits. 10 to 20 digits within the range are provided by us for methods based on implicit learning for training; during the testing phase, this range is called In-Dist. Furthermore, we present results for both Scratchpad (Anil et al., 2022) and Algorithmic (Zhou et al., 2022b) approaches.
Hardware Specification	Yes	Table 10 displays the parameter settings for the T5 models during training, which is conducted on four NVIDIA A6000 GPUs with 48GB of memory each. For the GLM-130B, we employ the Faster Transformer framework to set up local inference with INT4 on eight NVIDIA Ge Force RTX 3090 GPUs with 24GB of memory each.
Software Dependencies	No	The paper mentions using "Py Torch framework (Paszke et al., 2019)", "Adafactor optimizer (Shazeer & Stern, 2018)", and "JAX" and "RASP" frameworks, but it does not specify version numbers for any of these software components.
Experiment Setup	Yes	For the T5 models, we employ the standard fine-tuning approach using the pretrained models as a starting point. We follow the pre-processing steps in the T5 original paper, which involves set the input text max length to 150 and using the tokenizer to process the data. We use a batch size of 64 for all models and the Adafactor optimizer (Shazeer & Stern, 2018) with a learning rate of 1 10 4. The models are trained for a maximum of 20 epochs. We use a cosine learning rate schedule with a warm-up phase comprising 5% of the total number of training steps. We employ a dropout rate of 0.1 during training to mitigate overfitting.