A new dog learns old tricks: RL finds classic optimization algorithms

Authors: Weiwei Kong, Christopher Liaw, Aranyak Mehta, D. Sivakumar

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our new ideas on the Ad Words problem, the online knapsack problem, and the secretary problem. Our results indicate that the models have learned behaviours that are consistent with the optimal algorithms for these problems derived using the online primal-dual framework.
Researcher Affiliation Collaboration Weiwei Kong Georgia Institute of Technology wkong37@gatech.edu Christopher Liaw University of British Columbia cvliaw@cs.ubc.ca Aranyak Mehta Google aranyak@google.com D. Sivakumar Google siva@google.com
Pseudocode No No structured pseudocode or algorithm blocks were found. Algorithms are described narratively within the text.
Open Source Code No No statement regarding the release of open-source code or a link to a code repository for the described methodology was found.
Open Datasets No The paper describes how input instances for the experiments are generated (e.g., from uniform distributions U2[0,1], or specific adversarial graph constructions detailed in Appendix A) rather than using or providing access to publicly available, pre-existing datasets.
Dataset Splits No The paper describes training on various input lengths and testing on larger input lengths or different instance types, which is characteristic of online learning or generalization. It does not provide explicit train/validation/test dataset splits (e.g., percentages or sample counts) for a fixed dataset, as data is often generated or streamed.
Hardware Specification No The paper states 'all our training is done over a single machine' but does not provide any specific hardware details such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions the use of 'standard REINFORCE algorithm for policy gradient, with the Adam optimizer' and 'ReLU nonlinearity', but does not provide specific software dependency versions (e.g., Python, PyTorch/TensorFlow versions, or other library versions).
Experiment Setup Yes We use a feedforward neural network with five hidden layers each with 500 neurons and ReLU nonlinearity. We then train the network using the standard REINFORCE algorithm with a simple fixed learning rate of 10^-4 and a batch size of 10.