Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Understanding Interlocking Dynamics of Cooperative Rationalization
Authors: Mo Yu, Yang Zhang, Shiyu Chang, Tommi Jaakkola
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on two synthetic benchmarks and two real datasets demonstrate that A2R can significantly alleviate the interlock problem and find explanations that better align with human judgments. |
| Researcher Affiliation | Collaboration | Mo Yu1 Yang Zhang1 Shiyu Chang1,2 Tommi S. Jaakkola3 1MIT-IBM Watson AI Lab 2UC Santa Barbara 3CSAIL MIT |
| Pseudocode | No | The paper describes its methods in text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our code at https://github.com/Gorov/Understanding_Interlocking. |
| Open Datasets | Yes | Beer Advocate from [32] is a multi-aspect sentiment prediction dataset, which has been commonly used in the ๏ฌeld of rationalization [6, 11, 27, 46]. This dataset includes sentence-level annotations, where each sentence is annotated with one or multiple aspect labels. The Movie Review dataset is from the Eraser benchmark [16]. Movie Review is a sentiment prediction dataset that contains phrase-level rationale annotations. The Movie Review data is publicly available at http://www.eraserbenchmark.com/. |
| Dataset Splits | No | The aforementioned hyperparameters and the best models to report are selected according to the development set accuracy. |
| Hardware Specification | Yes | Every compared model is trained on a single V100 GPU. |
| Software Dependencies | No | The paper mentions "Adam [24] as the default optimizer" and "100-dimension Glove embeddings [34]" but does not provide specific version numbers for any software or libraries. |
| Experiment Setup | Yes | We use Adam [24] as the default optimizer with a learning rate of 0.001. The policy gradient update uses a learning rate of 1e-4. The exploration rate is 0.2. |