Minimalistic Unsupervised Representation Learning with the Sparse Manifold Transform
Authors: Yubei Chen, Zeyu Yun, Yi Ma, Bruno Olshausen, Yann LeCun
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With a one-layer deterministic (one training epoch) sparse manifold transform, it is possible to achieve 99.3% KNN top-1 accuracy on MNIST, 81.1% KNN top-1 accuracy on CIFAR-10, and 53.2% on CIFAR-100. With simple grayscale augmentation, the model achieves 83.2% KNN top-1 accuracy on CIFAR-10 and 57% on CIFAR-100. These results significantly close the gap between simplistic white-box methods and SOTA methods. We also provide visualization to illustrate how an unsupervised representation transform is formed. |
| Researcher Affiliation | Collaboration | Yubei Chen1,2, Zeyu Yun4,5, Yi Ma4, Bruno Olshausen4,5,6, Yann Le Cun1,2,3 1 Meta AI 2 Center for Data Science, 3 Courant Institute, New York University 4 EECS Dept., 5 Redwood Center, 6 Helen Wills Neuroscience Inst., UC Berkeley |
| Pseudocode | Yes | Algorithm 1: Online 1-sparse dictionary learning heuristic (K-mean clustering) |
| Open Source Code | No | The paper does not contain an explicit statement of open-source code release or a link to a repository for the described methodology. |
| Open Datasets | Yes | achieves 99.3% KNN top-1 accuracy on MNIST, 81.1% KNN top-1 accuracy on CIFAR-10, and 53.2% on CIFAR-100 without data augmentation. ... We use Wiki Text-103 corpus [76] to train SMT word embedding, which contains around 103 million tokens. |
| Dataset Splits | No | The paper mentions using standard datasets like MNIST, CIFAR-10, and CIFAR-100 for evaluation and refers to “training examples” in ablation studies, but it does not explicitly state the training, validation, and test splits (e.g., percentages or sample counts) or cite a specific source for these splits. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running experiments, such as GPU models, CPU models, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions using “Basic English Tokenizer from torchtext”, “transform module in pytorchvision”, and “solo-learn” for pretraining, but it does not specify any version numbers for these software dependencies or other key components. |
| Experiment Setup | Yes | For both MNIST and CIFAR, we use 6x6 image patches, i.e., T = 6. Once patch embeddings are computed, we use a spatial average pooling layer with ks = 4 and stride = 2 to aggregate patch embeddings to compute the final image-level embeddings. ... For all the benchmark results on the deep SSL model, we follow the pretraining procedure used in solo-learn. Each method is pretrained for 1000 epochs. ... For both methods, we use resnet-18 as the backbone, and a batch size of 256, LARS optimizer, with 10 epochs of warmup follows by a cosine decay. For the Sim CLR model, we set the learning rate to be 0.4 and a weight decay of 1e-5. We also set the temperature to 0.2, the dimension for the projection output layer to 256, and the dimension of the projection hidden layer to 2048. For VICReg, we set the learning rate to 0.3 and a weight decay of 1e-4. We also set the temperature to 0.2, and set the dimension for the projection output and hidden layer to 2048. We set the weight for similarity loss and variance loss to 25, and the weight for covariance loss to 1. |