Is Multiple Object Tracking a Matter of Specialization?

Authors: Gianluca Mancusi, Mattia Bernardi, Aniello Panariello, Angelo Porrello, Rita Cucchiara, SIMONE CALDERARA

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on MOTSynth, along with zero-shot evaluations on MOT17 and Person Path22 demonstrate that a neural tracker built from carefully selected modules surpasses its monolithic counterpart. We release models and code.
Researcher Affiliation Academia Gianluca Mancusi Mattia Bernardi Aniello Panariello Angelo Porrello Rita Cucchiara Simone Calderara AImage Lab University of Modena and Reggio Emilia name.surname@unimore.it
Pseudocode No The paper describes the methods in detail but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes We release models and code.
Open Datasets Yes MOTSynth [10] is a large synthetic dataset for pedestrian detection and tracking in urban scenarios, generated using a photorealistic video game. It comprises 764 full HD videos, each 1800 frames long, showcasing various attributes.
Dataset Splits Yes In our experiments, following [29], we reduced the test sequences to 600 frames each and further split the training set to extract 48 validation sequences, shortened to 150 frames, for validation during training.
Hardware Specification Yes The training is performed on a single RTX 4080 GPU with a batch size of 1 for both phases.
Software Dependencies No The paper mentions software like YOLOX, Dance Track, and ResNet but does not provide specific version numbers for these or other libraries/frameworks.
Experiment Setup Yes The learning rates are set to 5 10 5 for the transformer and 1 10 6 for the visual backbone. For the Lo RA hyperparameters, we use r = 16, a weight decay of 0.1, and a learning rate of 3 10 4. The scale & shift layers employ a learning rate of 1 10 5 and a weight decay of 1 10 4. The training is performed on a single RTX 4080 GPU with a batch size of 1 for both phases. Due to the small batch size, we accumulate gradients over four backward steps before performing an optimizer step.