Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Collaborative Models for Referring Expression Generation in Situated Dialogue
Authors: Rui Fang, Malcolm Doering, Joyce Chai
AAAI 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results have shown that the episodic model and the installment model outperform previous non-collaborative models with an absolute gain of 6% and 21% respectively. Empirical Evaluations Experimental Setup To evaluate the performance of both the episodic model and the installment model for REG, we conducted an empirical study using crowd-sourcing from the Amazon Mechanical Turk. |
| Researcher Affiliation | Academia | Rui Fang, Malcolm Doering and Joyce Y. Chai Department of Computer Science and Engineering Michigan State University East Lansing, Michigan 48824 EMAIL |
| Pseudocode | Yes | Algorithm 1 details our episodic model for generating episodic expressions. The learning model is shown in Algorithm 2. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use the same scenes and target objects used in (Fang et al. 2013) in our evaluation in order to have a valid comparison. We used the same 48 scenes used in (Fang et al. 2013) for evaluation. |
| Dataset Splits | No | The paper describes a |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions using Amazon Mechanical Turk and a reinforcement learning framework (SARSA) but does not provide specific software names with version numbers for any libraries or tools used in the implementation or experimentation. |
| Experiment Setup | Yes | We use Pr(a0|s0; θ) = exp(θT φ(st,at)) Pa exp(θT φ(st,a )) to choose the best action based on the current estimation of θ, with ϵ-greedy (ϵ = 0.2) for the exploration (meaning 20% of the time, we randomly choose an action). The learning rate αt is set to 30 30+t and we stop training when the magnitude of updates θt+1 θt is smaller than 0.0001. |