Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Case-Based Reasoning with Language Models for Classification of Logical Fallacies
Authors: Zhivar Sourati, Filip Ilievski, Hรดng-รn Sandlin, Alain Mermoud
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments in in-domain and out-of-domain settings indicate that Case-Based Reasoning improves the accuracy and generalizability of language models. Our ablation studies suggest that representations of similar cases have a strong impact on the model performance, that models perform well with fewer retrieved cases, and that the size of the case database has a negligible effect on the performance. |
| Researcher Affiliation | Collaboration | 1Information Sciences Institute, University of Southern California, Marina del Rey, CA, USA 2Department of Computer Science, University of Southern California, Los Angeles, CA, USA 3Cyber-Defence Campus, armasuisse Science and Technology, Switzerland |
| Pseudocode | No | The paper describes the components of the CBR pipeline but does not provide any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our code and data available to support future research on logical fallacy classification.1 1https://github.com/zhpinkman/CBR |
| Open Datasets | Yes | We use two logical fallacy datasets from [Jin et al., 2022], called LOGIC and LOGIC Climate. ...As LOGIC dataset is severely imbalanced, we augment its train split using two techniques, i.e., back-translation, and substitution of entities in the arguments with their synonymous terms. |
| Dataset Splits | No | The paper mentions 'train split' and 'test' datasets (LOGIC and LOGIC Climate) but does not explicitly detail a validation split or its size/proportion for reproduction. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its experiments, such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions software like Sim CSE, BERT, RoBERTa, and ELECTRA, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We use Sim CSE [Gao et al., 2021], a transformer-based retriever that is optimized for capturing overall sentence similarity, to compute the similarity between cases ( 2) and also use H = 8 heads for the multi-headed attention component. The depth of our classifier is d = 2. It uses gelu [Hendrycks and Gimpel, 2016] as an activation function. We analyze the performance of our model using k {1, 2, 3, 4, 5}. To test the generalization of our model with sparser case databases, we experiment with various ratios of the case database within {0.1, 0.4, 0.7, 1.0}. |