Saliency-based Sequential Image Attention with Multiset Prediction
Authors: Sean Welleck, Jialin Mao, Kyunghyun Cho, Zheng Zhang
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the classification performance, training process, and hierarchical attention with set-based and multiset-based classification experiments. To test the effectiveness of the permutation-invariant RL training, we compare against a baseline model that uses a cross-entropy loss on the probabilities pt,i and (randomly ordered) labels yi instead of the RL training, similar to training proposed in [42]. Datasets Two synthetic datasets, MNIST Set and MNIST Multiset, as well as the real-world SVHN dataset, are used. Each dataset is split into 60,000 training examples and 10,000 testing examples, and metrics are reported for the testing set. |
| Researcher Affiliation | Academia | Sean Welleck New York University wellecks@nyu.edu Jialin Mao New York University jialin.mao@nyu.edu Kyunghyun Cho New York University kyunghyun.nyu.edu Zheng Zhang New York University zz@nyu.edu |
| Pseudocode | No | The paper describes the architecture and processes in prose and with diagrams (Figure 1), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific repository links or explicit statements about the release of source code for the described methodology. |
| Open Datasets | Yes | Two synthetic datasets, MNIST Set and MNIST Multiset, as well as the real-world SVHN dataset, are used. |
| Dataset Splits | Yes | For MNIST Set and Multiset, each 100x100 image in the dataset has a variable number (1-4) of digits, of varying sizes (20-50px) and positions, along with cluttering objects that introduce noise. Each dataset is split into 60,000 training examples and 10,000 testing examples, and metrics are reported for the testing set. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instance specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a 'Res Net-34 network pre-trained on Image Net' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Images are resized to 224x224, and the final (4th) convolutional layer is used (V R512 7 7). Since the label sets vary in size, the model is trained with an extra 'stop' class, and during inference greedy argmax sampling is used until the 'stop' class is predicted. |