Collaborative Models for Referring Expression Generation in Situated Dialogue
Authors: Rui Fang, Malcolm Doering, Joyce Chai
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results have shown that the episodic model and the installment model outperform previous non-collaborative models with an absolute gain of 6% and 21% respectively. Empirical Evaluations Experimental Setup To evaluate the performance of both the episodic model and the installment model for REG, we conducted an empirical study using crowd-sourcing from the Amazon Mechanical Turk. |
| Researcher Affiliation | Academia | Rui Fang, Malcolm Doering and Joyce Y. Chai Department of Computer Science and Engineering Michigan State University East Lansing, Michigan 48824 {fangrui, doeringm, jchai}@cse.msu.edu |
| Pseudocode | Yes | Algorithm 1 details our episodic model for generating episodic expressions. The learning model is shown in Algorithm 2. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use the same scenes and target objects used in (Fang et al. 2013) in our evaluation in order to have a valid comparison. We used the same 48 scenes used in (Fang et al. 2013) for evaluation. |
| Dataset Splits | No | The paper describes a |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions using Amazon Mechanical Turk and a reinforcement learning framework (SARSA) but does not provide specific software names with version numbers for any libraries or tools used in the implementation or experimentation. |
| Experiment Setup | Yes | We use Pr(a0|s0; θ) = exp(θT φ(st,at)) Pa exp(θT φ(st,a )) to choose the best action based on the current estimation of θ, with ϵ-greedy (ϵ = 0.2) for the exploration (meaning 20% of the time, we randomly choose an action). The learning rate αt is set to 30 30+t and we stop training when the magnitude of updates θt+1 θt is smaller than 0.0001. |