reproducibilityindex.ai

Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues

Authors: Hung Le, Nancy F. Chen, Steven Hoi

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate the effectiveness of our method and provide additional insights on how models use semantic dependencies in a dialogue context to retrieve visual cues.
Researcher Affiliation	Collaboration	Hung Le , Nancy F. Chen , Steven C.H. Hoi Singapore Management University hungle.2018@smu.edu.sg A*STAR, Institute for Infocomm Research nfychen@i2r.a-star.edu.sg Salesforce Research Asia shoi@salesforce.com
Pseudocode	Yes	Algorithm 1: Compositional semantic graph of dialogue context
Open Source Code	No	The paper does not include an unambiguous statement or a direct link to the source code for the methodology described.
Open Datasets	Yes	We use the Audio-Visual Sene-Aware Dialogue (AVSD) benchmark developed by Alamri et al. (2019).
Dataset Splits	Yes	Train Val Test@DSTC7 Test@DSTC8 #Dialogs 7,659 1,787 1,710 1,710 #Questions/Answers 153,180 35,740 13,490 18,810 #Words 1,450,754 339,006 110,252 162,226
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies	Yes	We ﬁrst employ a co-reference resolution system, e.g. (Clark & Manning, 2016). We then explore using the Stanford parser system1 to discover sub-nodes. The parser decomposes each sentence into grammatical components, where a word and its modiﬁer are connected in a tree structure. ... 1v3.9.2 retrieved at https://nlp.stanford.com/software/lex-parser.shtml ... word2vec embeddings2 and compute the cosine similarity score. ... 2https://code.google.com/archive/p/word2vec/ ... We experiment with the Adam optimizer (Kingma & Ba, 2015).
Experiment Setup	Yes	We experiment with the Adam optimizer (Kingma & Ba, 2015). The models are trained with a warm-up learning rate period of 5 epochs before the learning rate decays and the training ﬁnishes up to 50 epochs. The best model is selected by the average loss in the validation set. All model parameters, except the decoder parameters when using pre-trained language models, are initialized with uniform distribution (Glorot & Bengio, 2010). The Transformer hyper-parameters are ﬁne-tuned by validation results over d = {128, 256}, h = {1, 2, 4, 8, 16}, and a dropout rate from 0.1 to 0.5. Label smoothing (Szegedy et al., 2016) is applied on labels of ˆ At (label smoothing does not help when optimizing over ˆRt as the labels are limited by the maximum length of dialogues, i.e. 10 in AVSD).