Video-based Human-Object Interaction Detection from Tubelet Tokens
Authors: Danyang Tu, Wei Sun, Xiongkuo Min, Guangtao Zhai, Wei Shen
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness and efficiency of TUTOR are verified by extensive experiments. Results show our method outperforms existing works by large margins, with a relative m AP gain of 16.14% on Vid HOI and a 2 points gain on CAD-120 as well as a 4 speedup. |
| Researcher Affiliation | Academia | 1Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University 2Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University {danyangtu, sunguwei, minxiongkuo, zhaiguangtao, wei.shen}@sjtu.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code is provided in supplementary material. |
| Open Datasets | Yes | We conduct experiments on Vid HOI [5] and CAD-120 [22] benchmarks to evaluate the proposed methods by following the standard scheme. Vid HOI is a large-scale dataset for V-HOI detection, comprising 6,366 videos for training and 756 videos for validation. ...CAD-120 is a relatively smaller dataset that consists of 120 RGB-D videos. |
| Dataset Splits | Yes | Vid HOI is a large-scale dataset for V-HOI detection, comprising 6,366 videos for training and 756 videos for validation. |
| Hardware Specification | Yes | A batch size of 16 on 8 RTX-2080Ti GPUs, and learning rate lr = 2.5e 4 for Transformer and 1e 5 for FPN are used. |
| Software Dependencies | No | The paper mentions using an 'Adam W [31] optimizer' but does not specify versions for any other software dependencies such as PyTorch, TensorFlow, CUDA, or Python. |
| Experiment Setup | Yes | The dimension of HOI query is set to 256... The number of queries is set to 100 for Vid HOI and 50 for CAD-120... We employed an Adam W [31] optimizer for 150 epochs. A batch size of 16... learning rate lr = 2.5e 4 for Transformer and 1e 5 for FPN are used. The lr decayed by half at 50-th, 90-th and 120-th epoch, respectively. We use a lr = 10 6 to warm up the training for the first 5 epochs, and then go back to 2.5e 4 and continue training. |