Prospective Learning: Learning for a Dynamic Future

Authors: Ashwin De Silva, Rahul Ramesh, Rubing Yang, Siyu Yu, Joshua T Vogelstein, Pratik Chaudhari

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments illustrate that prospective ERM can learn synthetic and visual recognition problems constructed from MNIST and CIFAR-10. Code at https://github.com/neurodata/prolearn.
Researcher Affiliation Academia Ashwin De Silva ,1 Rahul Ramesh ,2 Rubing Yang ,2 Siyu Yu1 Joshua T. Vogelstein ,1 Pratik Chaudhari ,2 , Equal Contribution Email: {ldesilv2, syu80, jovo}@jhu.edu, {rahulram, rubingy, pratikac}@upenn.edu
Pseudocode No The paper describes algorithms and processes textually but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or structured code-like formatting.
Open Source Code Yes Code at https://github.com/neurodata/prolearn.
Open Datasets Yes Numerical experiments illustrate that prospective ERM can learn synthetic and visual recognition problems constructed from MNIST [10] and CIFAR-10 [11] data.
Dataset Splits No The paper does not explicitly mention using a separate validation set for hyperparameter tuning or early stopping. It states: "Learners are trained on data from the first t time steps (z t) and prospective risk is computed using samples from the remaining time steps.", and in the NeurIPS checklist: "We have conducted extremely thorough train/test splits, and tuned hyperparameters manually across multiple runs."
Hardware Specification No The paper mentions "GPU hours" in the NeurIPS checklist (Question 8) but does not provide specific details such as GPU models, CPU models, or memory specifications used for the experiments.
Software Dependencies No The paper does not provide specific software names with version numbers for its implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Hyper-parameters All the networks are trained using stochastic gradient descent (SGD) with Nesterov s momentum and cosine-annealed learning rate. The networks are trained at a learning rate of 0.1 for the synthetic tasks, and learning rate of 0.01 for MNIST and CIFAR. The weight-decay is set to 1 10 5. The images from MNIST and CIFAR-10 are normalized to have mean 0.5 and standard deviation 0.25. The models were trained for 100 epochs, which is many epochs after achieving a training accuracy of 1.