Discovering objects and their relations from entangled scene representations
Authors: David Raposo, Adam Santoro, David Barrett, Razvan Pascanu, Timothy Lillicrap, Peter Battaglia
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test the ability of RNs to discover relations between objects, we turned to the classification of scenes, wherein classification boundaries were defined by the relational structure of the objects in the scenes. The tasks on which we assessed the RN s performance fell into three categories. Here, we trained RNs on variations of the classification task described in section 3.2 and contrasted their performance with that of MLPs of different sizes and depths. |
| Researcher Affiliation | Industry | D. Raposo , A. Santoro , D.G.T. Barrett, R. Pascanu, T. Lillicrap, P. Battaglia Deep Mind London, United Kingdom {draposo, adamsantoro, barrettdavid, razp, countzero, peterbattaglia}@google.com |
| Pseudocode | No | RNs are inspired by Interaction Networks (INs) (Battaglia et al., 2016), and therefore share similar functional insights. Explanation: The paper describes the architecture and functions of the Relation Networks (RNs) in detail, but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The datasets will be made freely available. Explanation: The paper states that the datasets used will be made freely available, but it does not provide any statement or link indicating that the source code for the methodology described in the paper is openly accessible. |
| Open Datasets | No | Custom datasets were required to both explicitly test solutions to the task of inferring object relations, and to actively control for solutions that do not depend on object relations. The datasets will be made freely available. Explanation: The paper states that 'The datasets will be made freely available,' which implies future access, but it does not provide concrete access information such as a specific link, DOI, or a formal citation to an already available public dataset. |
| Dataset Splits | No | Training data consisted of 5000 samples derived from 5, 10, or 20 unique classes, with testing data comprising withheld within-class samples. All figures show performance on a withheld test-set, constituting 2-5% of the size of the training set. Explanation: The paper specifies the size of the training data and the test set split, but it does not explicitly mention a separate validation set or its proportion/size for reproducing the experiment. |
| Hardware Specification | No | The Adam optimizer was used for optimization (Kingma & Ba, 2014), with learning rate of 1e 4 for the scene description tasks, and a learning rate of 1e 5 for the one-shot learning task. Explanation: The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The Adam optimizer was used for optimization (Kingma & Ba, 2014), with learning rate of 1e 4 for the scene description tasks, and a learning rate of 1e 5 for the one-shot learning task. Explanation: The paper mentions algorithms and architectures used (e.g., Adam optimizer, VAE, LSTM), but it does not specify any software dependencies with their version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | The sizes of the RN in terms of number of layers and number of units for both fφ and gψ were {200, 200}, {500, 500}, {1000, 1000}, or {200, 200, 200}. The MLP baseline models used equivalent sizes. We experimented with different sizes for the summary vector si,j, which is the output from the RN. Performance is generally robust to the choice of size, with similar results emerging for 100, 200, or 500. The MANN used a LSTM controller size of 200, 128 memory slots, 40 for the memory size, and 4 read and write heads. The Adam optimizer was used for optimization (Kingma & Ba, 2014), with learning rate of 1e 4 for the scene description tasks, and a learning rate of 1e 5 for the one-shot learning task. The number of iterations varied for each experiment, and are indicated in the relevant figures. All figures show performance on a withheld test-set, constituting 2-5% of the size of the training set. The number of training samples was 5000 per class for scene description tasks, and 200 per class (for 100 classes) for the pixel disentangling experiment. We used minibatch training, with batch-sizes of 100 for the scene description experiments, and 16 (with sequences of length 50) for the one-shot learning task. |