Prospective Learning: Learning for a Dynamic Future
Authors: Ashwin De Silva, Rahul Ramesh, Rubing Yang, Siyu Yu, Joshua T Vogelstein, Pratik Chaudhari
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments illustrate that prospective ERM can learn synthetic and visual recognition problems constructed from MNIST and CIFAR-10. Code at https://github.com/neurodata/prolearn. |
| Researcher Affiliation | Academia | Ashwin De Silva ,1 Rahul Ramesh ,2 Rubing Yang ,2 Siyu Yu1 Joshua T. Vogelstein ,1 Pratik Chaudhari ,2 , Equal Contribution Email: {ldesilv2, syu80, jovo}@jhu.edu, {rahulram, rubingy, pratikac}@upenn.edu |
| Pseudocode | No | The paper describes algorithms and processes textually but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or structured code-like formatting. |
| Open Source Code | Yes | Code at https://github.com/neurodata/prolearn. |
| Open Datasets | Yes | Numerical experiments illustrate that prospective ERM can learn synthetic and visual recognition problems constructed from MNIST [10] and CIFAR-10 [11] data. |
| Dataset Splits | No | The paper does not explicitly mention using a separate validation set for hyperparameter tuning or early stopping. It states: "Learners are trained on data from the first t time steps (z t) and prospective risk is computed using samples from the remaining time steps.", and in the NeurIPS checklist: "We have conducted extremely thorough train/test splits, and tuned hyperparameters manually across multiple runs." |
| Hardware Specification | No | The paper mentions "GPU hours" in the NeurIPS checklist (Question 8) but does not provide specific details such as GPU models, CPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for its implementation (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Hyper-parameters All the networks are trained using stochastic gradient descent (SGD) with Nesterov s momentum and cosine-annealed learning rate. The networks are trained at a learning rate of 0.1 for the synthetic tasks, and learning rate of 0.01 for MNIST and CIFAR. The weight-decay is set to 1 10 5. The images from MNIST and CIFAR-10 are normalized to have mean 0.5 and standard deviation 0.25. The models were trained for 100 epochs, which is many epochs after achieving a training accuracy of 1. |