Tracking People with 3D Representations

Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on the Posetrack, Mu Po Ts and AVA datasets. We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance.
Researcher Affiliation Academia Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik UC Berkeley
Pseudocode Yes Algorithm 1 Tracking Algorithm
Open Source Code Yes Code and results are available at: https://brjathu.github.io/T3DP
Open Datasets Yes We evaluate our algorithm on three different datasets: Pose Track (1), Mu Pots (27) and AVA (13). ... We train this method using images from COCO (24), MPII (2) and Human3.6M (15).
Dataset Splits No The paper mentions using training and test sets from specific datasets (e.g., 'training set of Pose Track (1)', 'test split from (30)'), but does not provide specific train/validation/test percentages, sample counts, or detailed splitting methodology needed to reproduce the data partitioning for its own experiments.
Hardware Specification Yes All experiments are conducted on a single RTX 2080 Ti.
Software Dependencies No The paper mentions using a 'pretrained HMR model' and 'Resnet-50 backbone', but does not provide specific version numbers for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We train the appearance head for roughly 500k iterations with a learning rate of 0.0001 and a batch size of 16 images while keeping the pose head frozen. ... training lasts for about 100k iterations, with a learning rate of 0.0001. Finally, for training the transformer, we use the training set of Pose Track (1)... Training lasts for 10k iterations with a learning rate of 0.001.