Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

Authors: Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, Jun Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superiority of our approach over existing techniques in terms of length generalization, efficiency, and interpretability for symbolic operations. Furthermore, it can be applied to LMs across different model scales, outperforming tool-calling methods in arithmetic reasoning tasks while maintaining superior inference efficiency. Our work highlights the potential of seamlessly unifying explicit rule learning via Co NNs and implicit pattern learning in LMs, paving the way for true symbolic comprehension capabilities.
Researcher Affiliation Academia 1 The Laboratory of Cognition and Decision Intelligence for Complex Systems, IA, CAS 2 School of Artificial Intelligence, University of Chinese Academy of Sciences 3 College of Electrical and Information Engineering, Hunan University
Pseudocode Yes THE TRACR CODE OF PARITY CONN def parity(sop) -> rasp.SOp: """Multiply the length of each token.""" sop = rasp.Sequence Map(lambda x,y: x * y,sop,length).named( map_length ) """Add each bit.""" out = rasp.numerical(rasp.Aggregate(rasp.Select(rasp.indices,rasp.indices,rasp.Comparison. TRUE).named( Select ),rasp.numerical(rasp.Map(lambda x: x, sop).named( map_length )), default=0).named( Aggregate )) """Calculate whether the remainder of dividing it by 2 is odd or even.""" out = rasp.Map(lambda x: 0 if x % 2 == 0 else 1,out).named( Zipmap )
Open Source Code Yes The code is released at: https://github.com/wengsyx/Neural-Comprehension.
Open Datasets Yes GSM8K: https://github.com/openai/grade-school-math; Single Eq: https://gitlab.cs.washington.edu/ALGES/TACL2015; Add Sub: https://www.cs.washington.edu/nlp/arithmetic; Multi Arith: http://cogcomp.cs.illinois.edu/page/resource_view/ 98; SVAMP: https://github.com/arkilpatel/SVAMP
Dataset Splits Yes Our experimental design encompasses 1000 40 independent test sets, comprising problems with varying digit lengths from 1 to 40 digits. 10 to 20 digits within the range are provided by us for methods based on implicit learning for training; during the testing phase, this range is called In-Dist. Furthermore, we present results for both Scratchpad (Anil et al., 2022) and Algorithmic (Zhou et al., 2022b) approaches.
Hardware Specification Yes Table 10 displays the parameter settings for the T5 models during training, which is conducted on four NVIDIA A6000 GPUs with 48GB of memory each. For the GLM-130B, we employ the Faster Transformer framework to set up local inference with INT4 on eight NVIDIA Ge Force RTX 3090 GPUs with 24GB of memory each.
Software Dependencies No The paper mentions using "Py Torch framework (Paszke et al., 2019)", "Adafactor optimizer (Shazeer & Stern, 2018)", and "JAX" and "RASP" frameworks, but it does not specify version numbers for any of these software components.
Experiment Setup Yes For the T5 models, we employ the standard fine-tuning approach using the pretrained models as a starting point. We follow the pre-processing steps in the T5 original paper, which involves set the input text max length to 150 and using the tokenizer to process the data. We use a batch size of 64 for all models and the Adafactor optimizer (Shazeer & Stern, 2018) with a learning rate of 1 10 4. The models are trained for a maximum of 20 epochs. We use a cosine learning rate schedule with a warm-up phase comprising 5% of the total number of training steps. We employ a dropout rate of 0.1 during training to mitigate overfitting.