Learning to Communicate and Solve Visual Blocks-World Tasks
Authors: Qi Zhang, Richard Lewis, Satinder Singh, Edmund Durfee5781-5788
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments have two primary aims. The first aim is to demonstrate and understand the generalization abilities of the speaker-listener agents by examining their interpolation and extrapolation (these terms are defined formally below) performance. We contrast the performances obtained via Bandit, SL, and RL training, and use the single-agent as a useful baseline to provide a kind of upper-bound on expected performance of the speaker-listener agents. |
| Researcher Affiliation | Academia | Qi Zhang,1 Richard Lewis,2 Satinder Singh,1 Edmund Durfee1 1Computer Science and Engineering, 2Department of Psychology, University of Michigan {qizhg,rickl,baveja,durfee}@umich.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code access. |
| Open Datasets | No | We created data sets for our experiments by first sampling configurations from the probabilistic grammar and populating bins corresponding to configuration size (number of blocks) with unique configurations. |
| Dataset Splits | No | The paper specifies training and testing sets, but does not explicitly mention a distinct validation set or its split details. 'All configurations with number of blocks N satisfying N B, except for configuration sizes of multiples of 5 are used for training (i.e., we trained only on configurations N B, N mod 5 = 0). All other configurations are used for testing. Specifically, configurations with N < B, N mod 5 = 0 are used for interpolation testing, and N > B for extrapolation.' |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions software components and architectures like 'recurrent neural-network agents', 'convolutional layers', 'LSTM networks', 'Gumbel-softmax', 'Re LU layer', and 'softmax layer', but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The vocabulary size is set to |V | = 10 symbols, the utterance length is set to L = 15 symbols. ... (The Bandit loss function LBL = LBandit λLentropy is defined via ... where H is entropy, λ 0 is the entropy regularization coefficient, ...) ... (the cumulative reward from time step t with discount factor γ = 0.99, and b is the baseline of REINFORCE which is set to be the average episodic reward of the previous epoch.) |