End-to-end Active Object Tracking via Reinforcement Learning
Authors: Wenhan Luo, Peng Sun, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The tracker trained in simulators (Vi ZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object. It can restore tracking when occasionally losing the target. With the experiments over the VOT dataset, we also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios. |
| Researcher Affiliation | Collaboration | 1Tencent AI Lab 2Peking University. |
| Pseudocode | No | The paper describes algorithms and network architecture in text and diagrams but does not include explicit pseudocode blocks or sections labeled 'Algorithm'. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Finally, we perform qualitative evaluation on some video clips taken from the VOT dataset (Kristan et al., 2016). |
| Dataset Splits | No | The paper mentions 'best validation result' but does not specify how validation sets were created, their sizes, or any other details for reproducibility of the validation split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware components (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Vi ZDoom, Unreal Engine, Unreal CV, Open AI Gym, OpenCV, and Dlib, but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | To be more specific, the tracker observes the raw visual state and takes one action from the action set A = {turn-left, turn-right, turn-left-and-move-forward, turn-right-and-move-forward, move-forward, no-op}... The screen is resized to 84 84 3 RGB image as the network input... x2 + (y d)2 where A > 0, c > 0, d > 0, λ > 0 are tuning parameters... we let the reward threshold be -450 and the maximum length be 3000, respectively... This map is then augmented as described in Sec. 3.5 with N = 21. |