A new dog learns old tricks: RL finds classic optimization algorithms
Authors: Weiwei Kong, Christopher Liaw, Aranyak Mehta, D. Sivakumar
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our new ideas on the Ad Words problem, the online knapsack problem, and the secretary problem. Our results indicate that the models have learned behaviours that are consistent with the optimal algorithms for these problems derived using the online primal-dual framework. |
| Researcher Affiliation | Collaboration | Weiwei Kong Georgia Institute of Technology wkong37@gatech.edu Christopher Liaw University of British Columbia cvliaw@cs.ubc.ca Aranyak Mehta Google aranyak@google.com D. Sivakumar Google siva@google.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. Algorithms are described narratively within the text. |
| Open Source Code | No | No statement regarding the release of open-source code or a link to a code repository for the described methodology was found. |
| Open Datasets | No | The paper describes how input instances for the experiments are generated (e.g., from uniform distributions U2[0,1], or specific adversarial graph constructions detailed in Appendix A) rather than using or providing access to publicly available, pre-existing datasets. |
| Dataset Splits | No | The paper describes training on various input lengths and testing on larger input lengths or different instance types, which is characteristic of online learning or generalization. It does not provide explicit train/validation/test dataset splits (e.g., percentages or sample counts) for a fixed dataset, as data is often generated or streamed. |
| Hardware Specification | No | The paper states 'all our training is done over a single machine' but does not provide any specific hardware details such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions the use of 'standard REINFORCE algorithm for policy gradient, with the Adam optimizer' and 'ReLU nonlinearity', but does not provide specific software dependency versions (e.g., Python, PyTorch/TensorFlow versions, or other library versions). |
| Experiment Setup | Yes | We use a feedforward neural network with five hidden layers each with 500 neurons and ReLU nonlinearity. We then train the network using the standard REINFORCE algorithm with a simple fixed learning rate of 10^-4 and a batch size of 10. |