End-to-End Deep Reinforcement Learning for Conversation Disentanglement

Authors: Karan Bhukar, Harshit Kumar, Dinesh Raghu, Ajay Gupta

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on the Ubuntu IRC dataset, we demonstrate that the proposed RL model improves the performance on both link-level and conversation-level metrics. We evaluate the proposed RL based approach on the widely used Ubuntu IRC dataset (Kummerfeld et al. 2018). We find that our RL based approach that uses our novel TL-FBC metric as reward is significantly better than baselines on both link-level and conversation-level metrics.
Researcher Affiliation Industry Karan Bhukar1, Harshit Kumar1, Dinesh Raghu1, Ajay Gupta*2 1 IBM Research 2 Meta karan.bhukar1@ibm.com, harshitk@in.ibm.com, diraghu1@in.ibm.com, guptaajay@fb.com
Pseudocode No The paper describes its methodology using text and mathematical equations but does not contain a structured pseudocode or algorithm block with a clear label such as 'Algorithm' or 'Pseudocode'.
Open Source Code Yes 1https://github.com/karan121bhukar/RL-ConvDisentanglement
Open Datasets Yes We use Ubuntu IRC (Internet Relay Chat) (Kummerfeld et al. 2018), the most widely used conversation disentanglement dataset, for our experiments.
Dataset Splits Yes Table 1: Statistics of Ubuntu IRC dataset Train 220,463 Messages Dev 12,500 Messages Test 15,000 Messages
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments, only mentioning the use of PyTorch.
Software Dependencies No The paper mentions using 'Py-Torch (Paszke et al. 2019)' but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes The set of hyper-parameters that give best results with learning rate and input sequence length set to 5e-6 and 128 respectively. We set the the number of trajectories N and the candidate parents window size w to 10 and 50 respectively.