Self-supervised Transformation Learning for Equivariant Representations

Authors: Jaemyung Yu, Jaehyun Choi, DongJae Lee, HyeongGwon Hong, Junmo Kim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the approach s effectiveness across diverse classification and detection tasks, outperforming existing methods in 7 out of 11 benchmarks and excelling in detection. Additionally, by incorporating Aug Mix, a complex transformation that is not feasible with existing equivariant learning, STL enhances performance across all tasks, showcasing its broad applicability. STL s compatibility with diverse base models, further highlights its versatility, as it achieves the highest average accuracy across foundational models. Extensive experiments and ablation studies further validate STL s ability to capture interdependencies among transformations in an unsupervised manner.
Researcher Affiliation Academia Jaemyung Yu1 Jaehyun Choi1 Dong-Jae Lee1 Hyeong Gwon Hong1 Junmo Kim1 1Korea Advanced Institute of Science and Technology (KAIST) {jaemyung,chlwogus,jhtwosun,honggudrnjs,junmo.kim}@kaist.ac.kr
Pseudocode No The paper describes the methodology in text and mathematical formulas but does not provide pseudocode or an algorithm block.
Open Source Code Yes The code is available at https://github.com/jaemyung-u/stl.
Open Datasets Yes We pretrain on STL10 [7] with Res Net-18 and Image Net100 [39, 42] with Res Net-50, following the split in [42]. Table 10: Dataset Information. Overview of dataset composition and evaluation metrics.
Dataset Splits Yes Table 10: Dataset Information. Overview of dataset composition and evaluation metrics. Each dataset specifies the number of classes, training/validation/test splits, and the corresponding evaluation metric. For linear evaluation dataset, validation samples are randomly selected from the training split if an official validation split is not provided.
Hardware Specification Yes For the pretraining experiments, we use NVIDIA RTX4090.
Software Dependencies No The paper mentions various architectures and methods but does not provide specific version numbers for software libraries like PyTorch, TensorFlow, or other key dependencies used in the implementation.
Experiment Setup Yes For STL and explicit baselines (SEN, Equi Mod, and SIE), we use an equivariant transformation network with a hypernetwork based on SIE... STL also includes a 3-layer MLP with a 512-dimensional hidden layer to encode 128-dimensional transformation representations from input pairs... The transformation prediction loss weight is set to 0.5 for implicit baselines, and the equivariant learning weight is set to 1 for explicit baselines. STL uses weights of 1, 1, and 0.2 for invariant, equivariant, and transformation learning losses, respectively. All methods are trained for 500 epochs with a batch size of 256, using a cosine learning rate schedule without restarts [30]. The initial learning rate is set at 0.03, with a weight decay of 0.0005. The model includes a 3-layer projection MLP head, g( ), with a hidden dimension of 2048 and an output dimension of 128. Batch normalization [24] is excluded from the last layer.