Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Mask and Infill: Applying Masked Language Model for Sentiment Transfer
Authors: Xing Wu, Tao Zhang, Liangjun Zang, Jizhong Han, Songlin Hu
IJCAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on two review datasets with quantitative, qualitative, and human evaluations. Experimental results demonstrate that our models improve state-of-the-art performance. |
| Researcher Affiliation | Academia | 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Implementation of Mask and Infill approach. |
| Open Source Code | No | The paper does not provide a direct link to its own source code or explicitly state that its code is released. The provided links are for baseline models or evaluation tools. |
| Open Datasets | Yes | We experiment our methods on two review datasets from [Li et al., 2018]: Yelp and Amazon [He and Mc Auley, 2016] |
| Dataset Splits | Yes | We experiment our methods on two review datasets from [Li et al., 2018]: Yelp and Amazon [He and Mc Auley, 2016], each of which is randomly split into training, validation and testing sets. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper mentions using 'pre-trained BERTbase' and 'a CNN-based classifier' but does not specify software dependencies with version numbers (e.g., PyTorch 1.x.x, TensorFlow 2.x.x). |
| Experiment Setup | Yes | The input size is kept compatible with original BERT and relevant hyperparameters can be found in [Devlin et al., 2018]. The pre-trained discriminator is a CNN-based classifier [Kim, 2014] with convolutional filters of size 3, 4, 5 and use Word Piece embeddings. The hyperparameters in Equation 10 and 11 are selected by a grid-search method using the validation set. We fine-tune BERT to AC-MLM for 10 epochs, and further train 6 epochs to apply discriminator constraint. |