Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Typicalness-Aware Learning for Failure Detection
Authors: Yijun Liu, Jiequan Cui, Zhuotao Tian, Senqiao Yang, Qingdong He, Xiaoling Wang, Jingyong Su
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | TAL has been extensively evaluated on benchmark datasets, and the results demonstrate its superiority over existing failure detection methods. |
| Researcher Affiliation | Collaboration | Yijun Liu1 Jiequan Cui2 Zhuotao Tian1 Senqiao Yang3 Qingdong He4 Xiaoling Wang1 Jingyong Su1 {liuyijun}@stu.hit.edu.cn 1Harbin Institute of Technology (Shenzhen) 2Nanyang Technological University 3The Chinese University of Hong Kong 4Tencent Youtu Lab |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided in the paper. |
| Open Source Code | Yes | Code is available at https://github.com/liuyijungoon/TAL. |
| Open Datasets | Yes | Datasets and models. We first evaluate on the small-scale CIFAR-100 [21] dataset with SVHN [11] as its out-of-distribution (OOD) test set. To demonstrate scalability, we further conduct experiments on large-scale Image Net [5] using Res Net-50, with Textures [3] and WILDS [20] serving as OOD data. |
| Dataset Splits | Yes | The original CIFAR100 dataset consists of 50,000 training images, with 5,000 images reserved for validation and the remaining 45,000 images used for training. |
| Hardware Specification | Yes | The models are trained for 200 epochs with a batch size of 256 on a single NVIDIA GeForce RTX 3090 GPU. On Image Net [5], we use the Res Net-50 architecture as our backbone. The models are trained for 90 epochs with an initial learning rate of 0.1 on a single NVIDIA A100. |
| Software Dependencies | No | The paper mentions software components like 'SGD optimizer', 'Cosine Annealing LR scheduler', and 'timm library', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For experiments on the CIFAR [21], we employ an SGD optimizer with an initial learning rate of 0.1, a momentum of 0.9, and a weight decay of 0.0005. The models are trained for 200 epochs with a batch size of 256 on a single NVIDIA GeForce RTX 3090 GPU. Furthermore, we adopt a Cosine Annealing LR scheduler to adjust the learning rate during training. On Image Net [5], we use the Res Net-50 architecture as our backbone. The models are trained for 90 epochs with an initial learning rate of 0.1 on a single NVIDIA A100. The learning rate is decayed by a factor of 0.1 every 30 epochs. ... where we empirically set Tmax and Tmin to 10 and 100, and they perform well on different benchmarks. |