LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Authors: Xinyu Pi, Wanjun Zhong, Yan Gao, Nan Duan, Jian-Guang Lou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Both base and large size language models pre-trained with Logi GAN demonstrate obvious performance improvement on 12 datasets requiring general reasoning abilities, revealing the fundamental role of logic in broad reasoning, as well as the effectiveness of Logi GAN. Ablation studies on Logi GAN components reveal the relative orthogonality between linguistic and logic abilities and suggest that reflective thinking s facilitation effect might also generalize to machine learning.
Researcher Affiliation Collaboration 1University of Illinois Urbana-Champaign, Urbana, USA 2Sun Yat-Sen University 3Microsoft Research Asia
Pseudocode Yes Algorithm 1: Adversarial Training Process
Open Source Code Yes The code is released in https://github.com/microsoft/ContextualSP/tree/master/logigan
Open Datasets Yes To test the effectiveness of Logi GAN, we extensively experiment on 12 datasets requiring general reasoning via natural language. Specifically, Re Clor (Yu et al., 2020), Logi QA (Liu et al., 2021a), Adversarial NLI ANLI, (Nie et al., 2019), focuses especially on logical reasoning, Tell Me Why (Lal et al., 2021) on abuductive reasoning, Hotpot QA (Yang et al., 2018a) on multi-hop reasoning, Quo Ref (Dasigi et al., 2019) on reasoing with co-reference resolution, Mu Tual (Cui et al., 2020), DREAM (Sun et al., 2019)), SAMSum (Gliwa et al., 2019) on reasoning in conversational scenarios, and Narrative QA (s Koˇ ciský et al., 2018), RACE (Lai et al., 2017), XSum (Narayan et al., 2018) on general verbal reasoning.
Dataset Splits No The paper mentions evaluating on 'development sets' in Table 1, but it does not provide specific training/validation/test split percentages or sample counts for any of the datasets used.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only defers to Appendix D for implementation details.
Software Dependencies No The paper mentions using T5 and ALBERT-large models, but it does not specify version numbers for these models or for any underlying software frameworks (e.g., PyTorch, TensorFlow) or libraries used to implement the method.
Experiment Setup No The paper states, 'We leave discussions of the rest implementation details and hyper-parameter settings of pre-training and downstream fine-tuning in Appendix D.' This indicates that the specific experimental setup details are not present in the main text provided.