An Information Theory Perspective on Variance-Invariance-Covariance Regularization

Authors: Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann LeCun

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Building on these results, we introduce a family of SSL methods derived from information-theoretic principles that outperform existing SSL techniques. ... We used a Res Net-50 model trained with Sim CLR or VICReg objectives on CIFAR-10, CIFAR-100, and Image Net datasets. ... As evidenced by Table 1, the proposed entropy estimators surpass the original SSL methods.
Researcher Affiliation Collaboration Ravid Shwartz-Ziv New York University Randall Balestriero Meta AI, FAIR Kenji Kawaguchi National University of Singapore Tim G. J. Rudner New York University Yann Le Cun New York University & Meta AI, FAIR
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides mathematical derivations and proofs.
Open Source Code No Additionally, we are committed to ensuring reproducibility and open science. Therefore, after publication, we will provide pretrained checkpoints and make the code openly available on a public repository.
Open Datasets Yes We used a Res Net-50 model trained with Sim CLR or VICReg objectives on CIFAR-10, CIFAR-100, and Image Net datasets. ... Experiments were conducted on three image datasets: CIFAR-10, CIFAR-100 [39], and Tiny-Image Net [20].
Dataset Splits No Appendix H states: "Upon its completion, we transition to the linear evaluation phase, which serves as an assessment tool for the quality of the representation produced by the pretrained encoder. ... we measure the test accuracy of the trained linear classifier using a separate validation dataset." While a validation dataset is mentioned, specific details about its size, percentage split, or how it was created (e.g., exact split ratios, or reference to predefined splits with specific details) are not provided in the main text or appendix.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks like PyTorch or TensorFlow).
Experiment Setup Yes The training process for each model unfolds over 800 epochs, employing a batch size of 512. We utilize the Stochastic Gradient Descent (SGD) optimizer, characterized by a momentum of 0.9 and a weight decay of 1e 4. The learning rate is initiated at 0.5 and is adjusted according to a cosine decay schedule complemented by a linear warmup phase. ... For the linear evaluation phase, the linear classifier is trained for 100 epochs with a batch size of 256. The SGD optimizer is again employed, this time with a momentum of 0.9 and no weight decay. The learning rate is managed using a cosine decay schedule, starting at 0.2 and reaching a minimum of 2e 4.