Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Authors: Zhu Wang, Sourav Medya, Sathya Ravi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that it is possible to design models that perform similarly to state-of-the-art results but with significantly fewer samples and less training time. Our models and code are available here: https://github.com/ellenzhuwang/implicit_vkood |
| Researcher Affiliation | Academia | Zhu Wang Sourav Medya Sathya N. Ravi Department of Computer Science, University of Illinois at Chicago {zwang260,medya,sathya}@uic.edu |
| Pseudocode | Yes | Algorithm 1 Fixed Point Network Operator based OOD Detection Layer for Language Features lj |
| Open Source Code | Yes | Our models and code are available here: https://github.com/ellenzhuwang/implicit_vkood |
| Open Datasets | Yes | We pre-trained on three datasets, including COCO [35], Visual Genome [28], and SBU Captions [47] with total of 1M images and 6.8M image-caption pairs, as approximate 30% less than the baseline(Vi LT). |
| Dataset Splits | No | The paper mentions datasets used for training, fine-tuning, and testing (e.g., VQAv2 test set, COCO val dataset), implying standard splits for these benchmarks. However, it does not explicitly provide specific percentages or counts for training/validation/test splits for full reproducibility, stating 'Following standard practice in Vision' for training strategies. |
| Hardware Specification | Yes | We pre-trained and fine-tuned both on 8 NVIDIA RTX 2080Ti GPUs, and for inference we used 1 NVIDIA RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions various software components and models (e.g., RoBERTa, ViT-B/32, CLIP, BLIP, BERT-base, AdamW optimizer) but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Network training. We pre-trained the model for 10 epochs using Adam W optimizer [39] with learning rate of 1e 4 and weight decay of 1e 2. We chose the warm-up phase of learning rate to be 10% of the total training steps, and the learning rate was decayed linearly to 0 afterwards. Then, we fine-tuned our model for 5 epochs with learning rate of 2e 4 for all downstream tasks. In addition, we applied Rand Augment [12] as augmentation strategy in fine-tuning steps. |