reproducibilityindex.ai

Synthesizing Programmatic Policies that Inductively Generalize

Authors: Jeevana Priya Inala, Osbert Bastani, Zenna Tavares, Armando Solar-Lezama

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement our algorithm and evaluate it on a set of reinforcement learning problems focused on tasks that require inductive generalization. We show that traditional deep RL approaches perform well on the original task, but fail to generalize inductively, whereas our state machine policies successfully generalize beyond the training distribution.
Researcher Affiliation	Academia	Jeevana Priya Inala MIT CSAIL jinala@csail.mit.edu Osbert Bastani University of Pennsylvania obastani@seas.upenn.edu Zenna Tavares MIT CSAIL zenna@mit.edu Armando Solar-Lezama MIT CSAIL asolar@csail.mit.edu
Pseudocode	Yes	Algorithm 1 Greedy algorithm for learning switching conditions.
Open Source Code	No	The paper does not provide any explicit statement or link for open-sourcing its own code. It only references 'openai baselines' as a third-party tool.
Open Datasets	No	The paper defines problem instances for training, such as 'd ∈ [12,13.5]m' for Car or 'x dist = 40m' for Quad, but does not provide access to a publicly available dataset or a generator for these instances.
Dataset Splits	No	The paper explicitly mentions 'training distribution' and 'test distribution' but does not specify a separate validation split or dataset.
Hardware Specification	No	The paper mentions 'We used a parallelized implementation with 10 threads' but does not specify any concrete hardware details (e.g., CPU/GPU models, memory, or cloud platform specifics) used for running experiments.
Software Dependencies	No	The paper mentions 'PPO2 implementation from Open AI Baselines (Dhariwal et al., 2017)' but does not list specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	There are three main hyper-parameters in our algorithm: [...] We use λ = 100 for all our experiments. [...] The number of training minibatches per update, nminibatches = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048}. [...] The policy entropy coefﬁcient in the optimization objective, ent_coef = {0.0, 0.01, 0.05, 0.1}. [...] The number of training epochs per update, noptepochs ∈ {3, ..., 36}. [...] The clipping range, cliprange = {0.1, 0.2, 0.3}. [...] The learning rate, lr ∈ [5 · 10−6, 0.003].