Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GALAXY: Graph-based Active Learning at the Extreme
Authors: Jifan Zhang, Julian Katz-Samuels, Robert Nowak
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we demonstrate GALAXY s superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets. We conduct experiments under 8 different class imbalance settings. |
| Researcher Affiliation | Academia | 1University of Wisconsin, Madison, USA. Correspondence to: Jifan Zhang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 S2: Shortest Shortest Path. Algorithm 2 Build Graph. Algorithm 3 Connect: build higher order edges. Algorithm 4 GALAXY. |
| Open Source Code | Yes | Code can be found in https://github.com/jifanz/GALAXY. |
| Open Datasets | Yes | We generate the extremely unbalanced settings for both binary and multi-class classification from popular vision datasets CIFAR-10(Krizhevsky et al., 2009), CIFAR-100(Krizhevsky et al., 2009), Path MNIST(Yang et al., 2021) and SVHN(Netzer et al., 2011). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It mentions using 'the pool' and evaluating 'over the pool', but no specific split percentages or counts for training, validation, or test sets are detailed. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions 'training deep learning systems'. |
| Software Dependencies | No | The paper mentions 'Res Net-18 model in Py Torch' and 'Adam optimization algorithm' but does not specify version numbers for PyTorch or any other software dependencies, making it difficult to precisely reproduce the software environment. |
| Experiment Setup | Yes | We set B = 100 and T = 50. We use the Res Net-18 model in Py Torch pretrained on Image Net for initialization and cold-start the training for every labeled set L. We use the Adam optimization algorithm with learning rate of 10-2 and a fixed 500 epochs for each L. We use a cross entropy loss weighted by 1/Nk(L) for each class k. |