Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Tracking Dialogue State by Inheriting Slot Values in Mentioned Slot Pools

Authors: Zhoujian Sun, Zhengxing Huang, Nai Ding

IJCAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments on three annotated DST datasets, i.e., Multi WOZ 2.1, Multi WOZ 2.2, and WOZ 2.0, respectively [Eric et al., 2020; Zang et al., 2020; Wen et al., 2017]. Experimental results showed our model reached state of the art DST performance on Multi WOZ datasets.
Researcher Affiliation	Collaboration	1Zhejiang Lab 2Zhejiang University EMAIL, EMAIL
Pseudocode	No	The paper describes the model's structure and operations using text and mathematical equations, but it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We released the source code of this paper at https://github.com/ZJLAB-AMMI/msp.
Open Datasets	Yes	We conducted experiments on three annotated DST datasets, i.e., Multi WOZ 2.1, Multi WOZ 2.2, and WOZ 2.0, respectively [Eric et al., 2020; Zang et al., 2020; Wen et al., 2017].
Dataset Splits	Yes	Early stopping was employed based on the JGA of the development set.
Hardware Specification	No	The paper describes the architecture and parameter count of the BERT models used but does not specify the type of GPUs, CPUs, or other hardware on which the experiments were run.
Software Dependencies	No	The paper mentions using BERT, Adam optimizer, and Word Piece embeddings but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The maximum input sequence length was set to 512 tokens after tokenization. The weights α, β, and γ were 0.6, 0.2, and 0.2, respectively. The initial learning rate was set to 1e 5, and the total epoch number was set to 20. We conducted training with a warmup proportion of 10% and let the learning rate decay linearly after the warmup phase. Early stopping was employed based on the JGA of the development set.