Evaluating and Improving Interactions with Hazy Oracles
Authors: Stephan J. Lemmer, Jason J. Corso
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this new formalization and an innovative deferred inference method on the disparate tasks of Single-Target Video Object Tracking and Referring Expression Comprehension, ultimately reducing error by up to 48% without any change to the underlying model or its parameters. |
| Researcher Affiliation | Academia | University of Michigan, Ann Arbor, Michigan, USA lemmersj@umich.edu, jjcorso@umich.edu |
| Pseudocode | Yes | Algorithm 1: Calculating DEV DEV 0 DDC 1 while DDC 10 do tasks draw tasks() DEV DEV + calc error(tasks) 10(len(tasks)+1) N 0 while N < len(tasks) do cur task find task to defer(tasks, DDC) response get new input(cur task) updated task aggregate fn(cur task, response) update tasks(tasks, updated task) DEV DEV + calc error(tasks) 10(len(tasks)+1) N N + 1 end while DDC DDC + 1 end while |
| Open Source Code | No | The paper does not provide an explicit statement or a link to open-source code for the methodology described in this paper. |
| Open Datasets | Yes | Since it is the only VOT dataset, to our knowledge, that contains multiple annotations per tracked object, we perform our analysis using the crowdsourced data from Lemmer et al. (Lemmer, Song, and Corso 2021). This dataset consists of nine first-frame annotations for every video in the OTB-100 dataset (Wu, Lim, and Yang 2013). ... For the task model, our evaluation uses the UNITER architecture (Chen et al. 2020), which formulates referring expression comprehension as classification over a set of externally-provided bounding boxes. ... We train and evaluate on the Ref COCO (Kazemzadeh et al. 2014) dataset because it contains multiple references to all but one target object... |
| Dataset Splits | Yes | We maintain the val, test A, and test B splits from previous works (Yu et al. 2016), but note our evaluation measures per-task performance instead of per-phrase performance, making it incorrect to directly compare our results to other evaluations. |
| Hardware Specification | Yes | Our model is trained on a single GeForce GTX Titan XP GPU using the training settings given by the original authors with a few small modifications: we use full precision floating point operations, adjust the batch size from 128 to 64, and accumulate gradients over two steps. |
| Software Dependencies | No | The paper mentions using 'Scikit-Learn' but does not specify a version number for it or for other key software components like deep learning frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | For VOT, we use DBSCAN with epsilon 10 and minimum samples 20, and for our method, samples are scattered by adding a normally-distributed random value with standard deviation 7 to every dimension. For Referring Expression Comprehension, we use full precision floating point operations, adjust the batch size from 128 to 64, and accumulate gradients over two steps, performing Monte Carlo dropout with 100 passes. |