Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing
Authors: Adel Javanmard, Rudrajit Das, Alessandro Epasto, Vahab Mirrokni
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that extensions of the optimal g t derived in Theorem 3.2 is very effective for improving the performance of standard linear probing (i.e., fitting a linear layer on top of a pretrained model) as well as full network training with the cross-entropy loss for binary classification in the presence of label noise... We compare Bayes Mix RT/Bayes Mix-Simple RT with full RT and consensus-based RT proposed in [9]. For all our experiments, we use the body of a Res Net-50 model pretrained on Image Net... In Tables 1 and 2, we list the average test accuracies of full RT, consensus-based RT, and Bayes Mix RT (28) after 1 and 10 iterations for Med MNIST Pneumonia corrupted by the uniform noise model with p = 0.45... |
| Researcher Affiliation | Collaboration | Adel Javanmard University of Southern California Google Research EMAIL Rudrajit Das Google Research EMAIL Alessandro Epasto Google Research EMAIL Vahab Mirrokni Google Research EMAIL |
| Pseudocode | No | The paper describes iterative procedures and update rules using mathematical equations (4)-(5) and (17)-(18), but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: All our datasets are publicly available (links provided in the paper). We have provided experimental details in Appendix N to reproduce the results. |
| Open Datasets | Yes | We consider two datasets available on Tensor Flow: (i) Med MNIST Pneumonia [42] which is a medical binary classification dataset, and (ii) Food 101 [6] which is a multi-class food-based classification dataset... Med MNIST Pneumonia (https://www.tensorflow.org/datasets/catalog/ pneumonia_mnist)... Food-101 (https://www.tensorflow.org/datasets/catalog/food101) |
| Dataset Splits | Yes | Med MNIST Pneumonia (https://www.tensorflow.org/datasets/catalog/ pneumonia_mnist): This has 4708 training examples and comes with a validation set of size 200. The test set consists of 624 examples. 2. Food-101 (https://www.tensorflow.org/datasets/catalog/food101): Each class in Food-101 has 750 training examples; so the total number of examples for two classes (pho vs. ramen and spaghetti bolognese vs. spaghetti carbonara) is 1500. Out of these 1500 examples, we randomly select 100 examples as our validation set. The test set consists of 500 examples in total. |
| Hardware Specification | Yes | Our experiments were done using Tensor Flow and Keras on one 128 GB CPU and one 40 GB A100 GPU (per run). |
| Software Dependencies | No | Our experiments were done using Tensor Flow and Keras on one 128 GB CPU and one 40 GB A100 GPU (per run). The paper mentions TensorFlow and Keras but does not specify their version numbers. |
| Experiment Setup | Yes | For initial training as well as for each iteration of retraining, the optimizer is Adam (with default values of β1 = 0.9 and β2 = 0.999) with batch size = 32 & number of epochs = 10 for linear probing and batch size = 128 & number of epochs = 2 for full network training.7 We also apply weight decay = 0.1 in the case of full network training to mitigate overfitting.8 We tune the learning rate by monitoring the accuracy on a small clean validation set... We tune ηadv, η0 and η1 from {5 10 3,10 3,5 10 4,10 4,5 10 5,10 5,5 10 6,10 6}. |