reproducibilityindex.ai

Few-Shot Bayesian Imitation Learning with Logical Program Policies

Authors: Tom Silver, Kelsey R. Allen, Alex K. Lew, Leslie Pack Kaelbling, Josh Tenenbaum10251-10258

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we study six strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. Our policy learning is 20 1,000x more data efﬁcient than convolutional and fully convolutional policy learning and many orders of magnitude more computationally efﬁcient than vanilla program induction. We argue that the proposed method is an apt choice for tasks that have scarce training data and feature signiﬁcant, structured variation between task instances.
Researcher Affiliation	Academia	Tom Silver, Kelsey R. Allen, Alex K. Lew, Leslie Kaelbling, Josh Tenenbaum Massachusetts Institute of Technology {tslvr, krallen, alexlew, lpk, jbt}@mit.edu
Pseudocode	Yes	Algorithm 1: LPP imitation learning input: Demos D, ensemble size K, max iters L Create anti-demos D = {(s, a ) : (s, a) D, a = a}; Set labels y[(s, a)] = 1 if (s, a) D else 0; Initialize approximate posterior q; for i in 1, ..., L do fi = generate next feature(); X = {(f1(s, a), ..., fi(s, a))T : (s, a) D D} μi, wi = logical inference(X, y, p(f), K); update posterior(q, μi, wi); end return q;
Open Source Code	No	The paper does not provide any concrete access to source code for the described methodology. It does not mention a repository link or explicitly state that the code is being released.
Open Datasets	No	The paper describes using 'six strategy games' and states 'Instances of Nim, Checkmate Tactic, and Reach for the Star are procedurally generated; instances of Stop the Fall, Chase, and Fence In are manually generated'. However, it does not provide any links, DOIs, or citations to publicly available datasets or repositories for these game instances.
Dataset Splits	Yes	For each number of demonstrations, we run leave-one-out cross validation: 10 trials, each featuring a distinct set of demonstrations drawn from the overall pool of 11 training demonstrations.
Hardware Specification	Yes	All experiments were performed on a single laptop running mac OS Mojave with a 2.9 GHz Intel Core i9 processor and 32 GB of memory.
Software Dependencies	No	The paper mentions using 'an off-the-shelf stochastic greedy decision-tree learner (Pedregosa et al. 2011)' which refers to scikit-learn, but it does not specify version numbers for scikit-learn or any other software libraries or programming languages used.
Experiment Setup	Yes	LPP learning is run for 10,000 iterations for each task. The network has 8 convolutional layers with kernel size 3, stride 1, padding 1, 4 channels (8 in the input layer), and Re LU nonlinearities. The architecture is: 64-channel convolution; max pooling; 64-channel fully-connected layer; \|A\|channel fully-connected layer. All kernels have size 3 and all strides and paddings are 1.