reproducibilityindex.ai

MathAttack: Attacking Large Language Models towards Math Solving Ability

Authors: Zihao Zhou, Qiufeng Wang, Mingyu Jin, Jie Yao, Jianan Ye, Wei Liu, Wei Wang, Xiaowei Huang, Kaizhu Huang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on our Robust Math and two another math benchmark datasets GSM8K and Multi Airth show that Math Attack could effectively attack the math solving ability of LLMs.
Researcher Affiliation	Academia	1School of Advanced Technology, Xi an Jiaotong-Liverpool University 2University of Liverpool 3Northwestern University 4Shanghai Tech University 5Duke Kunshan University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and dataset is available at: https://github.com/zhouzihao501/Math Attack.
Open Datasets	Yes	Two math word problems benchmark datasets GSM8K (Cobbe et al. 2021) and Multi Arith (Roy and Roth 2015) are adopted in the experiments. [...] The code and dataset is available at: https://github.com/zhouzihao501/Math Attack.
Dataset Splits	No	The paper describes selecting subsets of GSM8K and Multi Arith datasets (307 and 150 MWP samples respectively) for experiments, but it does not specify the train/validation/test splits or their ratios used within these datasets for model training or evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions Spacy as the NER model but does not specify its version number. Other software mentioned refers to models or APIs without specific version details.
Experiment Setup	Yes	We set the temperature = 0 to stabilize the output of LLMs. When attacking victim models, we not only attack them with zero-shot prompt but also few-shot prompt. Specifically, we employ four MWP samples as shots and provide Chain-of Thought (Co T) (Wei et al. 2022) annotations.