reproducibilityindex.ai

Matryoshka Representation Learning

Authors: Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14 smaller embedding size for Image Net-1K classification at the same level of accuracy; (b) up to 14 real-world speed-ups for large-scale retrieval on Image Net-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (Image Net, JFT) across various modalities vision (Vi T, Res Net), vision + language (ALIGN) and language (BERT).
Researcher Affiliation	Collaboration	University of Washington, Google Research, Harvard University {kusupati,ali}@cs.washington.edu, prajain@google.com
Pseudocode	Yes	Refer to Alg 1 and Alg 2 in Appendix A for the building blocks of Matryoshka Representation Learning (MRL).
Open Source Code	Yes	MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.
Open Datasets	Yes	We adapt Matryoshka Representation Learning (MRL) to various representation learning setups (a) Supervised learning for vision: Res Net50 [27] on Image Net-1K [71] and Vi T-B/16 [22] on JFT-300M [80], (b) Contrastive learning for vision + language: ALIGN model with Vi T-B/16 vision encoder and BERT language encoder on ALIGN data [44] and (c) Masked language modelling: BERT [19] on English Wikipedia and Books Corpus [97].
Dataset Splits	Yes	Image Net-1K (trainset with 1.3M samples as the database and validation set with 50K samples as the queries). ... We learn thresholds on the maximum softmax probability [31] for each nested classifier on a holdout validation set.
Hardware Specification	No	The provided text of the paper does not explicitly state the specific hardware used (e.g., GPU models, CPU types) for running the experiments. It only indicates that such details are in Appendix C and I, which are not provided.
Software Dependencies	Yes	ffcv. https://github.com/libffcv/ffcv/, 2022. commit 607d117.
Experiment Setup	Yes	We use M = {8, 16, 32, 64, 128, 256, 512, 1024, 2048} and M = {12, 24, 48, 96, 192, 384, 768} as the explicitly optimized nested dimensions respectively. ... For a given query image, we obtained a shortlist, K = 200, of images from the database using a lower-dimensional representation, e.g. Ds = 16 followed by reranking with a higher capacity representation, e.g. Dr = 2048.