Video-based Human-Object Interaction Detection from Tubelet Tokens

Authors: Danyang Tu, Wei Sun, Xiongkuo Min, Guangtao Zhai, Wei Shen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness and efficiency of TUTOR are verified by extensive experiments. Results show our method outperforms existing works by large margins, with a relative m AP gain of 16.14% on Vid HOI and a 2 points gain on CAD-120 as well as a 4 speedup.
Researcher Affiliation Academia 1Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University 2Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University {danyangtu, sunguwei, minxiongkuo, zhaiguangtao, wei.shen}@sjtu.edu.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code is provided in supplementary material.
Open Datasets Yes We conduct experiments on Vid HOI [5] and CAD-120 [22] benchmarks to evaluate the proposed methods by following the standard scheme. Vid HOI is a large-scale dataset for V-HOI detection, comprising 6,366 videos for training and 756 videos for validation. ...CAD-120 is a relatively smaller dataset that consists of 120 RGB-D videos.
Dataset Splits Yes Vid HOI is a large-scale dataset for V-HOI detection, comprising 6,366 videos for training and 756 videos for validation.
Hardware Specification Yes A batch size of 16 on 8 RTX-2080Ti GPUs, and learning rate lr = 2.5e 4 for Transformer and 1e 5 for FPN are used.
Software Dependencies No The paper mentions using an 'Adam W [31] optimizer' but does not specify versions for any other software dependencies such as PyTorch, TensorFlow, CUDA, or Python.
Experiment Setup Yes The dimension of HOI query is set to 256... The number of queries is set to 100 for Vid HOI and 50 for CAD-120... We employed an Adam W [31] optimizer for 150 epochs. A batch size of 16... learning rate lr = 2.5e 4 for Transformer and 1e 5 for FPN are used. The lr decayed by half at 50-th, 90-th and 120-th epoch, respectively. We use a lr = 10 6 to warm up the training for the first 5 epochs, and then go back to 2.5e 4 and continue training.