ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Authors: Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On five datasets LOGO graphics generation, Date reasoning, Text Craft (a Minecraft-based text-game) MATH, and Tab MWP both open-source and proprietary LLMs improve in accuracy when predicting programs with REGAL functions. |
| Researcher Affiliation | Academia | Elias Stengel-Eskin * 1 Archiki Prasad * 1 Mohit Bansal 1 1UNC Chapel Hill. |
| Pseudocode | Yes | Algorithm 1 REGAL: Training Algorithm; Algorithm 2 REGAL: Testing Algorithm |
| Open Source Code | Yes | Code: https://github.com/esteng/regal_program_learning. |
| Open Datasets | Yes | We explore five datasets: LOGO (Ellis et al., 2021; Wong et al., 2021), a program induction task; a date reasoning task (Srivastava et al., 2022) known to challenge LLMs (Suzgun et al., 2022); Text Craft (Prasad et al., 2023), a text-based game for crafting Minecraft objects; a subset of MATH (Hendrycks et al., 2021)... and Tab MWP (Lu et al., 2022)... |
| Dataset Splits | Yes | We use the small train/test splits (200/111) from Wong et al. (2021) and take 100 dev examples from the large train set. ... Specifically, we split their predicted programs from GPT-3.5 into train, dev, and test splits (66/113/180) ... giving us a train/dev/test split of 190/50/77. ... This gives us a train/dev/test split of 194/61/74. ... This gives us a train/dev/test split of 194/60/74. |
| Hardware Specification | No | The paper mentions various LLMs used (e.g., Code Llama, GPT-3.5, Lemur) but does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to run the experiments. |
| Software Dependencies | Yes | For GPT-3.5, we use the gpt-3.5-turbo version (0613). All Code Llama models use the Code Llama-Instruct-hf versions, and we use the lemur-70b-v1 version of Lemur. |
| Experiment Setup | Yes | We use the dev set to select hyperparameter values, reported in Appendix C. All prompts can be found in Appendix D. ... Table 9 lists the refactoring and testing hyperparameters used for each domain. Setting LOGO Date Text Craft Rounds of refactoring 3 1 1 edit Every 5 5 5 prune Every 5 5 5 Add comments True False False Batch size 5 3 4 Filtering threshold 0.0 0.0 0.0 Filter before testing True True False ICL budget ratio 0.5 0.5 0.5 |