Dialogue Disentanglement in Software Engineering: How Far are We?

Authors: Ziyou Jiang, Lin Shi, Celia Chen, Jun Hu, Qing Wang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we conduct an exploratory study on 7,226 real-world developers dialogs mined from eight popular open-source projects hosted on Gitter. First, we compare five state-of-the-art dialog disentanglement approaches based on two strategies: transferring the original models across domains and retraining the models on software-related dialogs.
Researcher Affiliation Academia Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences State Key Laboratory of Computer Sciences, Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences, Beijing, China Occidental College, Los Angeles, California, USA
Pseudocode No The paper describes methods and calculations in text and mathematical formulas, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No We release a dataset2 of disentangled software-related dialogs to facilitate the replication of our study and future improvements of disentanglement models. Footnote 2 links to https://github.com/disensoftware/disentanglement-for-software, which is specified as a dataset, not the source code for the methodology.
Open Datasets Yes We release a dataset2 of disentangled software-related dialogs to facilitate the replication of our study and future improvements of disentanglement models. Footnote 2 links to https://github.com/disensoftware/disentanglement-for-software.
Dataset Splits Yes For each SOTA model, we retrain with seven projects and evaluate with the eighth project.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions using Spacy for tokenization but does not provide its version number or any other specific software dependencies with their respective version numbers.
Experiment Setup Yes FF model is a feedforward neural network with two layers, 256-dimensional hidden vectors, and softsign nonlinearities. Bi LSTM model is a bidirectional recurrent neural network with 160 context maximum size, 200 neurons with one hidden layer. BERT model uses... with 512 embedding size and 256 hidden units. E2E model performs... with 512 embedding size, 256 hidden neurons and 0.05 noise ratio. Ptr Net model utilizes... with 512 embedding size and 256 hidden units. ...convergences are already achieved at epoch 10-15...