HiTKG: Towards Goal-Oriented Conversations via Multi-Hierarchy Learning
Authors: Jinjie Ni, Vlad Pandelea, Tom Young, Haicang Zhou, Erik Cambria11112-11120
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Hi TKG achieves a significant improvement in the performance of turn-level goal learning compared with state-of-the-art baselines. Additionally, both automatic and human evaluation prove the effectiveness of the two-hierarchy learning framework for both short-term and long-term goal planning. |
| Researcher Affiliation | Academia | Jinjie Ni, Vlad Pandelea, Tom Young, Haicang Zhou, Erik Cambria Nanyang Technological University, Singapore {jinjie001, yang0552, haicang001}@e.ntu.edu.sg, {vlad.pandelea, cambria}@ntu.edu.sg |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code (no repository link, explicit code release statement, or code in supplementary materials). |
| Open Datasets | Yes | We conduct our evaluation on Open Dial KG (Moon et al. 2019). It is a dialogue KG dataset where each utterance of a dialogue is annotated with a KG path, which enables learning graph walkers to reason over the KG based on the conversations. |
| Dataset Splits | Yes | We follow the baselines and split it into train (70%), dev (15%), and test set (15%). |
| Hardware Specification | Yes | We use Pytorch (Paszke et al. 2019) to implement our model, which is trained on two RTX 8000 GPUs. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not specify a version number. Other software like 'ALBERT' and 'A2C' are also mentioned without version details. Thus, specific ancillary software details with version numbers are not provided. |
| Experiment Setup | Yes | We tune the hyperparameters by grid searching the hyperparameter space and choose the following settings that perform best: number of encoder/decoder layers: 2/6; dimension of the KG walker: 768; dimension of the KG embedding: 384 (stage 1), 256 (stage 2); loss coefficients γ/λ: 0.1/0.9; number of attention heads: 12; learning rate: 10 3; dropout rate: 0.1; L2 regularization parameter ϵ: 10 5; batch size: 10. We use learning rate scheduler to tune the learning rate manually and patient & early stopping to avoid overfitting. In addition, we use gradient clipping to avoid gradient explosions. |