Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
HEAL: A Knowledge Graph for Distress Management Conversations
Authors: Anuradha Welivita, Pearl Pu11459-11467
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Statistical and visual analysis conducted on HEAL reveals emotional dynamics between speakers and listeners in distress-oriented conversations and identifies useful response patterns leading to emotional relief. Automatic and human evaluation experiments show that HEAL s responses are more diverse, empathetic, and reliable compared to the baselines. |
| Researcher Affiliation | Academia | Anuradha Welivita, Pearl Pu School of Computer and Communication Sciences Ecole Polytechnique F ed erale de Lausanne Switzerland EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data available at github.com/anuradha1992/HEAL. |
| Open Datasets | Yes | Thus, we curated a new dataset from Reddit, containing dialogues that discuss real-world distressful situations. We used the Pushshift API (Baumgartner et al. 2020) to collect and process dialogue threads from a carefully selected set of 8 subreddits: mentalhealthsupport; offmychest; sad; suicidewatch; anxietyhelp; depression; depressed; and depression help... We used 80% of the dialogues to derive the knowledge graph and retained 10% of the dialogues each for validation and testing downstream tasks. |
| Dataset Splits | Yes | We used 80% of the dialogues to derive the knowledge graph and retained 10% of the dialogues each for validation and testing downstream tasks. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions various software components and models (e.g., Sentence BERT, BART, GPT-2, NLTK, RoBERTa, vis.js) but does not provide specific version numbers for them. |
| Experiment Setup | Yes | We experimented with 8 similarity thresholds from 0.6 to 0.95 with 0.05 increments to cluster distress narratives. ... This resulted in selecting an optimal threshold of 0.85. ... we selected 0.7, 0.75, and 0.7 as the optimal thresholds for clustering expectations, responses and feedback, respectively. |