reproducibilityindex.ai

Is Multiple Object Tracking a Matter of Specialization?

Authors: Gianluca Mancusi, Mattia Bernardi, Aniello Panariello, Angelo Porrello, Rita Cucchiara, SIMONE CALDERARA

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on MOTSynth, along with zero-shot evaluations on MOT17 and Person Path22 demonstrate that a neural tracker built from carefully selected modules surpasses its monolithic counterpart. We release models and code.
Researcher Affiliation	Academia	Gianluca Mancusi Mattia Bernardi Aniello Panariello Angelo Porrello Rita Cucchiara Simone Calderara AImage Lab University of Modena and Reggio Emilia name.surname@unimore.it
Pseudocode	No	The paper describes the methods in detail but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	We release models and code.
Open Datasets	Yes	MOTSynth [10] is a large synthetic dataset for pedestrian detection and tracking in urban scenarios, generated using a photorealistic video game. It comprises 764 full HD videos, each 1800 frames long, showcasing various attributes.
Dataset Splits	Yes	In our experiments, following [29], we reduced the test sequences to 600 frames each and further split the training set to extract 48 validation sequences, shortened to 150 frames, for validation during training.
Hardware Specification	Yes	The training is performed on a single RTX 4080 GPU with a batch size of 1 for both phases.
Software Dependencies	No	The paper mentions software like YOLOX, Dance Track, and ResNet but does not provide specific version numbers for these or other libraries/frameworks.
Experiment Setup	Yes	The learning rates are set to 5 10 5 for the transformer and 1 10 6 for the visual backbone. For the Lo RA hyperparameters, we use r = 16, a weight decay of 0.1, and a learning rate of 3 10 4. The scale & shift layers employ a learning rate of 1 10 5 and a weight decay of 1 10 4. The training is performed on a single RTX 4080 GPU with a batch size of 1 for both phases. Due to the small batch size, we accumulate gradients over four backward steps before performing an optimizer step.