Reclaiming the Source of Programmatic Policies: Programmatic versus Latent Spaces

Authors: Tales Henrique Carvalho, Kenneth Tjhia, Levi Lelis

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we show that the programmatic space, induced by the domain-specific language and requiring no training, presents values for the behavior loss similar to those observed in latent spaces presented in previous work. Moreover, algorithms searching in the programmatic space significantly outperform those in LEAPS and HPRL. To explain our results, we measured the friendliness of the two spaces to local search algorithms.
Researcher Affiliation Academia Tales H. Carvalho, Kenneth Tjhia, Levi H. S. Lelis Amii, Department of Computing Science, University of Alberta {taleshen,tjhia,levi.lelis}@ualberta.ca
Pseudocode Yes Algorithm 1 Hill Climbing for Programmatic Policies
Open Source Code Yes The codebase used in this work is available online.1
Open Datasets Yes We consider the KAREL and KAREL-HARD problem sets to define tasks. The KAREL set contains the tasks STAIRCLIMBER, MAZE, FOURCORNERS, TOPOFF, HARVESTER and CLEANHOUSE, all introduced by Trivedi et al. (2021).
Dataset Splits No The paper evaluates policies based on expected return over a set of initial states and describes various search algorithms, but it does not specify a separate validation dataset split.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for conducting the experiments.
Software Dependencies No The paper describes the problem domain and the algorithms used, but it does not list specific software dependencies with their version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup Yes For CEBS, we set the dimension of the latent vector d = 256, the neighborhood size K = 64, the elite size E = 16, and the noise σ = 0.25. The hyperparameters for CEM and HPRL are exactly as described in their papers.