Hierarchical Attentive Recurrent Tracking

Authors: Adam Kosiorek, Alex Bewley, Ingmar Posner

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation of the proposed model is performed on two datasets: pedestrian tracking on the KTH activity recognition dataset and the more difficult KITTI object tracking dataset. Section 5 presents experiments on KTH and KITTI datasets with comparison to related attention-based trackers.
Researcher Affiliation Academia Adam R. Kosiorek Department of Engineering Science University of Oxford adamk@robots.ox.ac.uk Alex Bewley Department of Engineering Science University of Oxford bewley@robots.ox.ac.uk Ingmar Posner Department of Engineering Science University of Oxford ingmar@robots.ox.ac.uk
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Code and results are available online1. 1https://github.com/akosiorek/hart
Open Datasets Yes Evaluation of the proposed model is performed on two datasets: pedestrian tracking on the KTH activity recognition dataset and the more difficult KITTI object tracking dataset.
Dataset Splits No We split all sequences into 80/20 sequences for train and test sets, respectively. No explicit mention of a validation split percentage.
Hardware Specification Yes The donation from Nvidia of the Titan Xp GPU used in this work is also gratefully acknowledged.
Software Dependencies No The paper mentions 'RMSProp optimiser [9]' and 'Alex Net [1]', but does not provide specific version numbers for software libraries or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes We use their pre-trained feature extractor. We follow the authors and set the glimpse size (h, w) = (28, 28). We replicate the training procedure exactly, with the exception of using the RMSProp optimiser [9] with learning rate of 3.33 10 5 and momentum set to 0.9. our feature map has the size of 14 14 384 with the input glimpse of size (h, w) = (56, 56). We apply dropout with probability 0.25 at the end of V1. We used 100 hidden units in the RNN with orthogonal initialisation and Zoneout [21] with probability set to 0.05. The system was trained via curriculum learning [2], by starting with sequences of length five and increasing sequence length every 13 epochs, with epoch length decreasing with increasing sequence length. We used the same optimisation settings, with the exception of the learning rate, which we set to 3.33 10 6.