Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Continual Learning with Evolving Class Ontologies

Authors: Zhiqiu Lin, Deepak Pathak, Yu-Xiong Wang, Deva Ramanan, Shu Kong

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments lead to some surprising conclusions; while the current status quo in the field is to relabel existing datasets with new class ontologies (such as COCO-to-LVIS or Mapillary1.2-to-2.0), LECO demonstrates that a far better strategy is to annotate new data with the new ontology. However, this produces an aggregate dataset with inconsistent old-vs-new labels, complicating learning. To address this challenge, we adopt methods from semi-supervised and partial-label learning. We demonstrate that such strategies can surprisingly be made near-optimal, in the sense of approaching an oracle that learns on the aggregate dataset exhaustively labeled with the newest ontology.
Researcher Affiliation Collaboration Zhiqiu Lin1 Deepak Pathak1 Yu-Xiong Wang2 Deva Ramanan1,3 Shu Kong4 1CMU 2UIUC 3Argo AI 4Texas A&M University
Pseudocode No The paper does not contain explicit pseudocode or algorithm blocks.
Open Source Code Yes Open-source code in webpage
Open Datasets Yes We define LECO benchmarks using three datasets: CIFAR100 [42], i Naturalist [69, 75], and Mapillary [52, 55]. CIFAR100 is released under the MIT license, and i Naturalist and Mapillary are publicly available for non-commercial research and educational purposes.
Dataset Splits Yes For each benchmark, we sample data from the corresponding dataset to construct time periods (TPs)... Moreover, in each TP of benchmark CIFAR-LECO and i Nat-LECO, we randomly sample 20% data as validation set for hyperparameter tuning and model select. We use their official valsets as our test-sets for benchmarking. In Mapillary-LECO, we do not use a valset but instead use the default hyperparameters reported in [72], tuned to optimize another related dataset Cityscapes [14].
Hardware Specification Yes While the above techniques are developed in the context of image classification, they appear to be less effective for semantic segmentation (on the Mapillary dataset), e.g., ST-Soft requires extremely large storage to save per-pixel soft labels, Fix Match requires two large models designed for semantic segmentation. Therefore, in Mapillary-LECO, we adopt the ST-Hard which is computationally friendly given our compute resource (Nvidia GTX-3090 Ti with 24GB RAM).
Software Dependencies No The paper mentions software components and techniques like 'SGD with momentum', 'cosine annealing learning rate schedule', 'weight decay', 'Rand Augment [16]', 'HRNet with OCR module [72, 76, 81]', but does not specify their version numbers.
Experiment Setup Yes We adopt standard training techniques including SGD with momentum, cosine annealing learning rate schedule, weight decay and Rand Augment [16]. We ensure the maximum training epochs (2000/300/800 on CIFAR/i Nat/Mapillary respectively) to be large enough for convergence. ... In training, we sample the same amount of old data and new data in a batch, i.e., | ˆBM| = |BK| (64 / 30 / 4 for CIFAR / i Nat / Mapillary). We assign equal weight to LSSL, LJoint, and L.