reproducibilityindex.ai

Learning to Communicate and Solve Visual Blocks-World Tasks

Authors: Qi Zhang, Richard Lewis, Satinder Singh, Edmund Durfee5781-5788

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments have two primary aims. The ﬁrst aim is to demonstrate and understand the generalization abilities of the speaker-listener agents by examining their interpolation and extrapolation (these terms are deﬁned formally below) performance. We contrast the performances obtained via Bandit, SL, and RL training, and use the single-agent as a useful baseline to provide a kind of upper-bound on expected performance of the speaker-listener agents.
Researcher Affiliation	Academia	Qi Zhang,1 Richard Lewis,2 Satinder Singh,1 Edmund Durfee1 1Computer Science and Engineering, 2Department of Psychology, University of Michigan {qizhg,rickl,baveja,durfee}@umich.edu
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code access.
Open Datasets	No	We created data sets for our experiments by ﬁrst sampling conﬁgurations from the probabilistic grammar and populating bins corresponding to conﬁguration size (number of blocks) with unique conﬁgurations.
Dataset Splits	No	The paper specifies training and testing sets, but does not explicitly mention a distinct validation set or its split details. 'All conﬁgurations with number of blocks N satisfying N B, except for conﬁguration sizes of multiples of 5 are used for training (i.e., we trained only on conﬁgurations N B, N mod 5 = 0). All other conﬁgurations are used for testing. Speciﬁcally, conﬁgurations with N < B, N mod 5 = 0 are used for interpolation testing, and N > B for extrapolation.'
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions software components and architectures like 'recurrent neural-network agents', 'convolutional layers', 'LSTM networks', 'Gumbel-softmax', 'Re LU layer', and 'softmax layer', but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	The vocabulary size is set to \|V \| = 10 symbols, the utterance length is set to L = 15 symbols. ... (The Bandit loss function LBL = LBandit λLentropy is deﬁned via ... where H is entropy, λ 0 is the entropy regularization coefﬁcient, ...) ... (the cumulative reward from time step t with discount factor γ = 0.99, and b is the baseline of REINFORCE which is set to be the average episodic reward of the previous epoch.)