Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DCN+: Mixed Objective And Deep Residual Coattention for Question Answering
Authors: Caiming Xiong, Victor Zhong, Richard Socher
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the Stanford Question Answering Dataset, our model achieves state-of-the-art results with 75.1% exact match accuracy and 83.1% F1, while the ensemble obtains 78.9% exact match accuracy and 86.0% F1. We train and evaluate our model on the Stanford Question Answering Dataset (SQu AD). We show our test performance of our model against other published models, and demonstrate the importance of our proposals via ablation studies on the development set. |
| Researcher Affiliation | Industry | Caiming Xiong , Victor Zhong , Richard Socher Salesforce Research Palo Alto, CA 94301, USA EMAIL |
| Pseudocode | No | The paper includes figures illustrating network architecture (Figure 1) and computation flow (Figure 2), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps for a method in a code-like format. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We train and evaluate our model on the Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al., 2016) |
| Dataset Splits | Yes | We train and evaluate our model on the Stanford Question Answering Dataset (SQu AD). We show our test performance of our model against other published models, and demonstrate the importance of our proposals via ablation studies on the development set. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' for implementation and 'ADAM' for optimization, and uses 'the reversible tokenizer from Stanford Core NLP', but it does not specify version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | The model is trained using ADAM (Kingma & Ba, 2014) with default hyperparameters. Hyperparameters of our model are identical to the DCN. We implement our model using Py Torch. We perform word dropout on the document which zeros a word embedding with probability 0.075. |