Improving Open Set Recognition via Visual Prompts Distilled from Common-Sense Knowledge
Authors: Seongyeop Kim, Hyung-Il Kim, Yong Man Ro
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments Dataset CIFAR10 (C10): Comprising 10 image classes with 50,000 training and 10,000 testing images, six classes were designated as known for the OSR task, leaving the remaining four as open set (Krizhevsky, Hinton et al. 2009). CIFAR+N (C+N): This setup involves four known classes from CIFAR10, with a variable number of unknown classes from CIFAR100, creating a more complex OSR problem as the number of unknown classes increases. Tiny Imagenet (TI): As the downscaled version of the Image Net dataset, it includes 200 classes. In our experiments, 20 were used as known classes, and the remaining 180 were treated as unknown (Le and Yang 2015), (Deng et al. 2009). Metrics AUC (Area Under the ROC Curve): This metric, ranging from 0 to 100%, assesses the classifier s ability to differentiate between known and unknown classes by quantifying the trade-off between sensitivity and specificity across different thresholds (Phillips, Grother, and Micheals 2011). F1 Score: As the balance between precision and recall, the F1 score expressed as a percentage is especially valuable in imbalanced datasets, offering insight into the trade-off between false positives and false negatives (Hand and Christen 2018). OSCR (Open Set Classification Rate): Adapted from the Detection and Identification Rate (DIR) curve, the OSCR curve plots the Correct Classification Rate (CCR) versus False Positive Rate (FPR) for known and unknown classes, providing a nuanced evaluation of accuracy in open set scenarios (Dhamija, G unther, and Boult 2018). |
| Researcher Affiliation | Collaboration | Seongyeop Kim1, Hyung-Il Kim2, Yong Man Ro1* 1Integrated Vision Language Lab., KAIST, South Korea 2ETRI, South Korea seongyeop@kaist.ac.kr, hikim@etri.re.kr, ymro@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1: Training for OSR with Visual Prompts Distilled from Common-Sense Knowledge |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | CIFAR10 (C10): Comprising 10 image classes with 50,000 training and 10,000 testing images, six classes were designated as known for the OSR task, leaving the remaining four as open set (Krizhevsky, Hinton et al. 2009). Tiny Imagenet (TI): As the downscaled version of the Image Net dataset, it includes 200 classes. In our experiments, 20 were used as known classes, and the remaining 180 were treated as unknown (Le and Yang 2015), (Deng et al. 2009). |
| Dataset Splits | No | The paper states training and testing image counts for CIFAR10 (50,000 training and 10,000 testing) but does not specify a validation dataset split (e.g., in terms of percentages or absolute counts). For Tiny Imagenet, it describes the known/unknown class division but not dataset splits. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models) used to run the experiments. It only mentions the 'VGG32 architecture' which is a model, not hardware. |
| Software Dependencies | No | The paper mentions incorporating 'the OPT-2.7B model from BLIP2 (Li et al. 2023b)' but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We have utilized a constant visual prompt size of P set at 30. |