reproducibilityindex.ai

End-to-end Active Object Tracking via Reinforcement Learning

Authors: Wenhan Luo, Peng Sun, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The tracker trained in simulators (Vi ZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object. It can restore tracking when occasionally losing the target. With the experiments over the VOT dataset, we also ﬁnd that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios.
Researcher Affiliation	Collaboration	1Tencent AI Lab 2Peking University.
Pseudocode	No	The paper describes algorithms and network architecture in text and diagrams but does not include explicit pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Finally, we perform qualitative evaluation on some video clips taken from the VOT dataset (Kristan et al., 2016).
Dataset Splits	No	The paper mentions 'best validation result' but does not specify how validation sets were created, their sizes, or any other details for reproducibility of the validation split.
Hardware Specification	No	The paper does not explicitly describe the specific hardware components (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions software like Vi ZDoom, Unreal Engine, Unreal CV, Open AI Gym, OpenCV, and Dlib, but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	To be more speciﬁc, the tracker observes the raw visual state and takes one action from the action set A = {turn-left, turn-right, turn-left-and-move-forward, turn-right-and-move-forward, move-forward, no-op}... The screen is resized to 84 84 3 RGB image as the network input... x2 + (y d)2 where A > 0, c > 0, d > 0, λ > 0 are tuning parameters... we let the reward threshold be -450 and the maximum length be 3000, respectively... This map is then augmented as described in Sec. 3.5 with N = 21.