Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SEMANTIFY: Unveiling Memes with Robust Interpretability beyond Input Attribution
Authors: Dibyanayan Bandyopadhyay, Asmit Ganguly, Baban Gain, Asif Ekbal
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation of SEMANTIFY using interpretability metrics, including leakage-adjusted simulatability, demonstrates its superiority over various baselines by up to 2.5 points. Human evaluation of relatedness and exhaustiveness of extracted keywords further validates its effectiveness. Additionally, a qualitative analysis of extracted keywords serves as a case study, unveiling model error cases and their reasons. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Indian Institute of Technology, Patna, India 2School of AI and Data Science, Indian Institute of Technology, Jodhpur, India |
| Pseudocode | Yes | Algorithm 1 Retrieve explainable keywords with four step filtering |
| Open Source Code | Yes | Code and Supplementary Material available at: https://github.com/newcodevelop/semantify |
| Open Datasets | Yes | We use the Facebook Hateful Meme dataset [Kiela et al., 2021] for performing the experiments. |
| Dataset Splits | Yes | To ensure robust evaluation on simulatability, we conduct a 5-fold cross-validation for testing the surrogate models (Section 4.2) after running experiments for 3, 500 steps on the respective train set. |
| Hardware Specification | Yes | All experiments were conducted on a single Nvidia A100 80GB GPU. |
| Software Dependencies | No | The paper mentions software like PyTorch, Python, GPT-2, and Huggingface transformers but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We employed the Adam optimizer [Kingma and Ba, 2017] with a learning rate of 0.005 for optimization. |