Test-time Adaptation with Slot-Centric Models
Authors: Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd Van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Slot-TTA across multiple input modalities, images or 3D point clouds, and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods. 4. Experiments |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Mila, Deep Mind 3Google Research. |
| Pseudocode | No | The paper describes its methods through text and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project Webpage: http://slot-tta.github.io/ Our code is publicly available to the community on our project webpage: http://slot-tta.github.io. |
| Open Datasets | Yes | We test Slot-TTA in scene segmentation of multi-view posed images, single-view images and 3D point clouds in the datasets of Part Net (Mo et al., 2019), Multi Shape Net-Hard (Sajjadi et al., 2022b) and CLEVR (Johnson et al., 2017). |
| Dataset Splits | No | The paper describes train and test splits, but a distinct 'validation' split for hyperparameter tuning or early stopping is not explicitly mentioned. |
| Hardware Specification | Yes | Test-time adaptation for each example takes about 10 seconds on a single TPUv2 chip. We use a single V100 GPU for training and inference. |
| Software Dependencies | No | The paper mentions optimizers like Adam but does not specify any software libraries, frameworks, or programming languages with version numbers (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | We use a batch size of 256 in this setting. We set our learning rate as 10 4. We use an Adam optimizer with β1 = 0.9, β2 = 0.999. For training, our model takes about 4 days to converge using 64 TPUv2 chips. |