Reclaiming the Source of Programmatic Policies: Programmatic versus Latent Spaces
Authors: Tales Henrique Carvalho, Kenneth Tjhia, Levi Lelis
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show that the programmatic space, induced by the domain-specific language and requiring no training, presents values for the behavior loss similar to those observed in latent spaces presented in previous work. Moreover, algorithms searching in the programmatic space significantly outperform those in LEAPS and HPRL. To explain our results, we measured the friendliness of the two spaces to local search algorithms. |
| Researcher Affiliation | Academia | Tales H. Carvalho, Kenneth Tjhia, Levi H. S. Lelis Amii, Department of Computing Science, University of Alberta {taleshen,tjhia,levi.lelis}@ualberta.ca |
| Pseudocode | Yes | Algorithm 1 Hill Climbing for Programmatic Policies |
| Open Source Code | Yes | The codebase used in this work is available online.1 |
| Open Datasets | Yes | We consider the KAREL and KAREL-HARD problem sets to define tasks. The KAREL set contains the tasks STAIRCLIMBER, MAZE, FOURCORNERS, TOPOFF, HARVESTER and CLEANHOUSE, all introduced by Trivedi et al. (2021). |
| Dataset Splits | No | The paper evaluates policies based on expected return over a set of initial states and describes various search algorithms, but it does not specify a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for conducting the experiments. |
| Software Dependencies | No | The paper describes the problem domain and the algorithms used, but it does not list specific software dependencies with their version numbers (e.g., Python version, library versions like PyTorch or TensorFlow). |
| Experiment Setup | Yes | For CEBS, we set the dimension of the latent vector d = 256, the neighborhood size K = 64, the elite size E = 16, and the noise σ = 0.25. The hyperparameters for CEM and HPRL are exactly as described in their papers. |