Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Authors: Hung Le, Nancy F. Chen, Steven Hoi
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate the effectiveness of our method and provide additional insights on how models use semantic dependencies in a dialogue context to retrieve visual cues. |
| Researcher Affiliation | Collaboration | Hung Le , Nancy F. Chen , Steven C.H. Hoi Singapore Management University hungle.2018@smu.edu.sg A*STAR, Institute for Infocomm Research nfychen@i2r.a-star.edu.sg Salesforce Research Asia shoi@salesforce.com |
| Pseudocode | Yes | Algorithm 1: Compositional semantic graph of dialogue context |
| Open Source Code | No | The paper does not include an unambiguous statement or a direct link to the source code for the methodology described. |
| Open Datasets | Yes | We use the Audio-Visual Sene-Aware Dialogue (AVSD) benchmark developed by Alamri et al. (2019). |
| Dataset Splits | Yes | Train Val Test@DSTC7 Test@DSTC8 #Dialogs 7,659 1,787 1,710 1,710 #Questions/Answers 153,180 35,740 13,490 18,810 #Words 1,450,754 339,006 110,252 162,226 |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | Yes | We first employ a co-reference resolution system, e.g. (Clark & Manning, 2016). We then explore using the Stanford parser system1 to discover sub-nodes. The parser decomposes each sentence into grammatical components, where a word and its modifier are connected in a tree structure. ... 1v3.9.2 retrieved at https://nlp.stanford.com/software/lex-parser.shtml ... word2vec embeddings2 and compute the cosine similarity score. ... 2https://code.google.com/archive/p/word2vec/ ... We experiment with the Adam optimizer (Kingma & Ba, 2015). |
| Experiment Setup | Yes | We experiment with the Adam optimizer (Kingma & Ba, 2015). The models are trained with a warm-up learning rate period of 5 epochs before the learning rate decays and the training finishes up to 50 epochs. The best model is selected by the average loss in the validation set. All model parameters, except the decoder parameters when using pre-trained language models, are initialized with uniform distribution (Glorot & Bengio, 2010). The Transformer hyper-parameters are fine-tuned by validation results over d = {128, 256}, h = {1, 2, 4, 8, 16}, and a dropout rate from 0.1 to 0.5. Label smoothing (Szegedy et al., 2016) is applied on labels of ˆ At (label smoothing does not help when optimizing over ˆRt as the labels are limited by the maximum length of dialogues, i.e. 10 in AVSD). |