Dialogue Disentanglement in Software Engineering: How Far are We?
Authors: Ziyou Jiang, Lin Shi, Celia Chen, Jun Hu, Qing Wang
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct an exploratory study on 7,226 real-world developers dialogs mined from eight popular open-source projects hosted on Gitter. First, we compare five state-of-the-art dialog disentanglement approaches based on two strategies: transferring the original models across domains and retraining the models on software-related dialogs. |
| Researcher Affiliation | Academia | Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences State Key Laboratory of Computer Sciences, Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences, Beijing, China Occidental College, Los Angeles, California, USA |
| Pseudocode | No | The paper describes methods and calculations in text and mathematical formulas, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | We release a dataset2 of disentangled software-related dialogs to facilitate the replication of our study and future improvements of disentanglement models. Footnote 2 links to https://github.com/disensoftware/disentanglement-for-software, which is specified as a dataset, not the source code for the methodology. |
| Open Datasets | Yes | We release a dataset2 of disentangled software-related dialogs to facilitate the replication of our study and future improvements of disentanglement models. Footnote 2 links to https://github.com/disensoftware/disentanglement-for-software. |
| Dataset Splits | Yes | For each SOTA model, we retrain with seven projects and evaluate with the eighth project. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using Spacy for tokenization but does not provide its version number or any other specific software dependencies with their respective version numbers. |
| Experiment Setup | Yes | FF model is a feedforward neural network with two layers, 256-dimensional hidden vectors, and softsign nonlinearities. Bi LSTM model is a bidirectional recurrent neural network with 160 context maximum size, 200 neurons with one hidden layer. BERT model uses... with 512 embedding size and 256 hidden units. E2E model performs... with 512 embedding size, 256 hidden neurons and 0.05 noise ratio. Ptr Net model utilizes... with 512 embedding size and 256 hidden units. ...convergences are already achieved at epoch 10-15... |