Collaborative Models for Referring Expression Generation in Situated Dialogue

Authors: Rui Fang, Malcolm Doering, Joyce Chai

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results have shown that the episodic model and the installment model outperform previous non-collaborative models with an absolute gain of 6% and 21% respectively. Empirical Evaluations Experimental Setup To evaluate the performance of both the episodic model and the installment model for REG, we conducted an empirical study using crowd-sourcing from the Amazon Mechanical Turk.
Researcher Affiliation Academia Rui Fang, Malcolm Doering and Joyce Y. Chai Department of Computer Science and Engineering Michigan State University East Lansing, Michigan 48824 {fangrui, doeringm, jchai}@cse.msu.edu
Pseudocode Yes Algorithm 1 details our episodic model for generating episodic expressions. The learning model is shown in Algorithm 2.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We use the same scenes and target objects used in (Fang et al. 2013) in our evaluation in order to have a valid comparison. We used the same 48 scenes used in (Fang et al. 2013) for evaluation.
Dataset Splits No The paper describes a
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies No The paper mentions using Amazon Mechanical Turk and a reinforcement learning framework (SARSA) but does not provide specific software names with version numbers for any libraries or tools used in the implementation or experimentation.
Experiment Setup Yes We use Pr(a0|s0; θ) = exp(θT φ(st,at)) Pa exp(θT φ(st,a )) to choose the best action based on the current estimation of θ, with ϵ-greedy (ϵ = 0.2) for the exploration (meaning 20% of the time, we randomly choose an action). The learning rate αt is set to 30 30+t and we stop training when the magnitude of updates θt+1 θt is smaller than 0.0001.