reproducibilityindex.ai

Asynchronous Perception Machine for Efficient Test Time Training

Authors: Rajat Modi, Yogesh Rawat

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We quantitatively evaluate APM on zero-shot test-time-training on popular benchmarks containing distribution-shifts [15, 86, 84]. Next, we quantitatively explore its computational efficiency. Datasets: Cifar-10C [86] contains 5 level of corruptions on the test-set with 15 types of noises. Larger datasets with significant distribution shifts consists of Image Net val/other curated splits. For e.g., Image Net-V2 contains natural images consisting of 10k images across 1000 classes. Image Net A contains 7500 adversarial-images consisting of 200 categories. Image Net-R consists of 30000 artistic-images across 200 Image Net categories. Image Net-Sketch consists of black/white sketches of 50000 images for 1000 classes. Image Net-C consisting of 15 types of corruptions with 5 levels of severity. There are additional 9 cross-dataset generalization datasets [84]. Baselines: We compare against standard TTT-online [86], TTT-MAE [15], TPT [84], CLIP VIT-B/16, Coop, Coco OP. We also benchmark CLIP VIT-L/14 and the strongest Open CLIP VIT-H/14-quickgelu variant pre-trained on dfn5b. Results and Analysis: We study APM s performance on test-time-training on several datasets. APM processes each test sample individually: i.e. the weights are drawn from a normal distribution after processing every sample to prevent information leakage. For zero-shot classification of a test-sample, APM leverages the 80 textual-prompt templates similar to the ones used in CLIP. In Tab 1, APM scales up to zero-shot classification task to datasets with 1000 classes. Using CLIP-Vi T B/16 as a teacher, we surpass TPT [84] with an avg score of 62.6 and avg ood-score of 61.2.
Researcher Affiliation	Academia	Rajat Modi Yogesh Singh Rawat Centre for Research in Computer Vision University of Central Florida Orlando, FL 32765 rajat.modi@ucf.edu,yogesh@crcv.ucf.edu
Pseudocode	Yes	Appendix C Pseudo-code for APM s operation. In Algorithm1, we have inflated the entire pseudo-code to train APM beyond the applications of test-time-training. ... Algorithm 1: Training APM in a self-supervised manner using a teacher U. ... Algorithm 2: Pseudo-Code for operation of APM during Test-Time-Training.
Open Source Code	Yes	Our code is publicly available at this link. (In abstract) ... In order to ensure the reproducibility of our experiments, we have shared the model in supplementary during review process. The code, model weights shall be released post-review.
Open Datasets	Yes	Datasets: Cifar-10C [86] contains 5 level of corruptions on the test-set with 15 types of noises. Larger datasets with significant distribution shifts consists of Image Net val/other curated splits. For e.g., Image Net-V2 contains natural images consisting of 10k images across 1000 classes. Image Net A contains 7500 adversarial-images consisting of 200 categories. Image Net-R consists of 30000 artistic-images across 200 Image Net categories. Image Net-Sketch consists of black/white sketches of 50000 images for 1000 classes. Image Net-C consisting of 15 types of corruptions with 5 levels of severity. There are additional 9 cross-dataset generalization datasets [84].
Dataset Splits	No	The paper does not explicitly state specific train/validation/test splits using percentages or counts for any of the datasets. It mentions datasets like ImageNet val/other curated splits, and references benchmarks which might have predefined splits, but doesn't detail them in the text.
Hardware Specification	Yes	All experiments are run on a same desktop-workstation containing 1x rtx a6000/96GB ram/Ubuntu 22.04/2TB ssd.
Software Dependencies	Yes	All the code has been written in Pytorch version 1.13.0.
Experiment Setup	Yes	All hyper-parameters utilized for APM during test-time-training are detailed in 7. We leveraged the seed 42 in most of our experiments, and also conducted experiments with multiple seeds. The weight matrices in APM were initilized with from a random distribution with µ = 0 and σ = 0.01. ... Table 7: APM hyperparameters during test-time-training. Number of Test samples 50000 (Imagenet Splits), variable for other datasets. Testing iterations 20 Batch Size 1 Learning Rate 1e-4 Optimizer Adam Feature Output size d 768/1024 Positional Encoding size 768/1024 Image/Crop Size 448 Augmentations Normalization, µ = (0.485, 0.456, 0.406), σ = (0.229, 0.224, 0.225) Precision fp16 (grad-scaled) Num of Workers 8 Operating System 1x rtx a6000 48GB/96GB ram/Ubuntu 22.04/2TB ssd/5TB HDD