Learning to Communicate and Solve Visual Blocks-World Tasks

Authors: Qi Zhang, Richard Lewis, Satinder Singh, Edmund Durfee5781-5788

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments have two primary aims. The first aim is to demonstrate and understand the generalization abilities of the speaker-listener agents by examining their interpolation and extrapolation (these terms are defined formally below) performance. We contrast the performances obtained via Bandit, SL, and RL training, and use the single-agent as a useful baseline to provide a kind of upper-bound on expected performance of the speaker-listener agents.
Researcher Affiliation Academia Qi Zhang,1 Richard Lewis,2 Satinder Singh,1 Edmund Durfee1 1Computer Science and Engineering, 2Department of Psychology, University of Michigan {qizhg,rickl,baveja,durfee}@umich.edu
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement or link for open-source code access.
Open Datasets No We created data sets for our experiments by first sampling configurations from the probabilistic grammar and populating bins corresponding to configuration size (number of blocks) with unique configurations.
Dataset Splits No The paper specifies training and testing sets, but does not explicitly mention a distinct validation set or its split details. 'All configurations with number of blocks N satisfying N B, except for configuration sizes of multiples of 5 are used for training (i.e., we trained only on configurations N B, N mod 5 = 0). All other configurations are used for testing. Specifically, configurations with N < B, N mod 5 = 0 are used for interpolation testing, and N > B for extrapolation.'
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions software components and architectures like 'recurrent neural-network agents', 'convolutional layers', 'LSTM networks', 'Gumbel-softmax', 'Re LU layer', and 'softmax layer', but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes The vocabulary size is set to |V | = 10 symbols, the utterance length is set to L = 15 symbols. ... (The Bandit loss function LBL = LBandit λLentropy is defined via ... where H is entropy, λ 0 is the entropy regularization coefficient, ...) ... (the cumulative reward from time step t with discount factor γ = 0.99, and b is the baseline of REINFORCE which is set to be the average episodic reward of the previous epoch.)