reproducibilityindex.ai

ResMem: Learn what you can and memorize the rest

Authors: Zitong Yang, MICHAL LUKASIK, Vaishnavh Nagarajan, Zonglin Li, Ankit Rawat, Manzil Zaheer, Aditya K. Menon, Sanjiv Kumar

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that Res Mem consistently improves the test set generalization of the original prediction model across standard vision and natural language processing benchmarks.
Researcher Affiliation	Collaboration	Zitong Yang Stanford University Stanford, CA 94305 zitong@berkeley.edu Michal Lukasik Google Research New York, NY, 10011 mlukasik@google.com
Pseudocode	No	The paper describes the algorithm steps in numbered lists (Section 4.1) and illustrates it in Figure 1, but does not present it in a formal pseudocode or 'Algorithm' block.
Open Source Code	No	No explicit statement providing concrete access to source code for the methodology described in this paper was found.
Open Datasets	Yes	Empirically, we show that such explicit memorization indeed leads to generalization benefits: Res Mem consistently improves the test accuracy of a baseline Deep Net on image classification tasks with CIFAR100 [33], and autoregressive language modeling on C4 [42] (Section 4).
Dataset Splits	Yes	For the language experiment... we created the query embeddings using the whole validation split and the same representation location.
Hardware Specification	No	The paper mentions 'CPU latency' but does not specify any particular CPU model. No other specific hardware details like GPU models, CPU types, or cloud instance specifications used for experiments are provided.
Software Dependencies	No	The paper mentions using 'Keras' for Mobile Net-V2 models and 'Sca NN' for nearest neighbor search, but no specific version numbers for these or other software dependencies are provided.
Experiment Setup	Yes	For all six Deep Net training, we use SGD with batch size 128, trained for 256 epochs. We use a peak learning rate 0.4, and momentum 0.9. We warm up the learning rate linearly for the first 15 epochs, and decay the learning rate by 0.1 after epochs {96, 192, 224}. For Res Mem, we use ... σ = 0.7, k = 53, and T = 1.4... We pre-trained the Deep Net ... for 1,000,000 steps, with dropout rate of 0.1 and batch size of 128. The learning rate for the first 10,000 steps is fixed to 0.01...