Can You Improve My Code? Optimizing Programs with Local Search

Authors: Fatemeh Abdollahi, Saqib Ameen, Matthew E. Taylor, Levi H. S. Lelis

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental POLIS was evaluated with a 27-person user study, where participants wrote programs attempting to maximize the score of two single-agent games: Lunar Lander and Highway. POLIS was able to substantially improve the participants programs with respect to the game scores.
Researcher Affiliation Academia Fatemeh Abdollahi , Saqib Ameen , Matthew E. Taylor and Levi H. S. Lelis Department of Computing Science, University of Alberta, Canada Alberta Machine Intelligence Institute (Amii) {fabdolla, saqib.ameen, matthew.e.taylor, levi.lelis}@ualberta.ca
Pseudocode Yes The pseudocode in Algorithm 1 shows the local search algorithm POLIS employs. It receives an existing program p and two time limits, t and tl, for the overall running time of the search and for the running time allowed to optimize each line of code, respectively, and an evaluation function F.
Open Source Code Yes Our POLIS implementation and the data collected in our user study is available at https://github.com/Fatemeh AB/POLIS.
Open Datasets Yes Our POLIS implementation and the data collected in our user study is available at https://github.com/Fatemeh AB/POLIS. ... For the task of writing programmatic policies for playing games, we use the approach introduced by Verma et al. [2018b] to define a set of input-output examples. That is, we train a neural policy that generates a set of input-output pairs...
Dataset Splits No The paper describes the generation of input-output examples for training neural policies and the evaluation of programmatic policies in game environments, but it does not explicitly specify traditional train/validation/test dataset splits or a dedicated validation set for hyperparameter tuning.
Hardware Specification No The paper mentions 'computational resources from Compute Canada' and the 'Intelligent Robot Learning (IRL) Lab at the University of Alberta' but does not provide specific hardware details like CPU or GPU models, memory, or other specifications.
Software Dependencies No The paper mentions software components like 'Open AI Gym' and 'DQN' but does not specify exact version numbers for these or any other software libraries or programming languages used in the experiments.
Experiment Setup Yes We use DQN [Mnih et al., 2015] to train a neural policy π for 2000 episodes. We let the agent follow π in the environment for 2000 steps... We use k = 20 in our experiments. We performed 5 restarts for each run of the system; the result of a run is the best program encountered across the 5 restarts. The game score of both the participants and POLIS s programs is an average of the score the program obtained in 100 of Lunar Lander and 25 episodes of Highway.