Synthesizing Programmatic Policies that Inductively Generalize
Authors: Jeevana Priya Inala, Osbert Bastani, Zenna Tavares, Armando Solar-Lezama
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our algorithm and evaluate it on a set of reinforcement learning problems focused on tasks that require inductive generalization. We show that traditional deep RL approaches perform well on the original task, but fail to generalize inductively, whereas our state machine policies successfully generalize beyond the training distribution. |
| Researcher Affiliation | Academia | Jeevana Priya Inala MIT CSAIL jinala@csail.mit.edu Osbert Bastani University of Pennsylvania obastani@seas.upenn.edu Zenna Tavares MIT CSAIL zenna@mit.edu Armando Solar-Lezama MIT CSAIL asolar@csail.mit.edu |
| Pseudocode | Yes | Algorithm 1 Greedy algorithm for learning switching conditions. |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-sourcing its own code. It only references 'openai baselines' as a third-party tool. |
| Open Datasets | No | The paper defines problem instances for training, such as 'd ∈ [12,13.5]m' for Car or 'x dist = 40m' for Quad, but does not provide access to a publicly available dataset or a generator for these instances. |
| Dataset Splits | No | The paper explicitly mentions 'training distribution' and 'test distribution' but does not specify a separate validation split or dataset. |
| Hardware Specification | No | The paper mentions 'We used a parallelized implementation with 10 threads' but does not specify any concrete hardware details (e.g., CPU/GPU models, memory, or cloud platform specifics) used for running experiments. |
| Software Dependencies | No | The paper mentions 'PPO2 implementation from Open AI Baselines (Dhariwal et al., 2017)' but does not list specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | There are three main hyper-parameters in our algorithm: [...] We use λ = 100 for all our experiments. [...] The number of training minibatches per update, nminibatches = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048}. [...] The policy entropy coefficient in the optimization objective, ent_coef = {0.0, 0.01, 0.05, 0.1}. [...] The number of training epochs per update, noptepochs ∈ {3, ..., 36}. [...] The clipping range, cliprange = {0.1, 0.2, 0.3}. [...] The learning rate, lr ∈ [5 · 10−6, 0.003]. |