Option Transfer and SMDP Abstraction with Successor Features
Authors: Dongge Han, Sebastian Tschiatschek
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate our proposed abstraction scheme for option transfer to new environments with known and unknown transition dynamics, and planning with abstract SMDPs. Table 1: Performance and efficiency of option grounding in Object Rooms. We show the success rate (success) of the learned options for achieving all specified goals across all starting states in the initiation set, and the number of LPs used to find the option policy. Figure 3: Performance of planning with the abstract MDPs. |
| Researcher Affiliation | Academia | 1University of Oxford, Department of Computer Science, Oxford, United Kingdom 2University of Vienna, Faculty of Computer Science, Vienna, Austria |
| Pseudocode | Yes | Algorithm 1 Option Grounding (IRL-NAIVE) |
| Open Source Code | No | The paper does not provide a specific repository link or an explicit statement about the release of the source code for the methodology described. |
| Open Datasets | No | The paper refers to custom 'Object-Rooms Setting' and 'Bake-Rooms Setting (Minecraft)' environments without providing concrete access information (link, DOI, formal citation) for publicly available datasets. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Malmo, a standard AI research platform using Minecraft', but does not provide specific version numbers for Malmo or any other software dependencies like libraries or frameworks. |
| Experiment Setup | Yes | Each agent runs for 20 iterations, each consisting of 200 steps. In the first iteration, all agents execute random actions. After each iteration, the agents construct an MDP graph based on collected transitions from all prior iterations. The eigenoption agent computes 3 eigenoptions of smallest eigenvalues using the normalized graph Laplacian, while IRL-BATCH grounds the 3 abstract options: 1. open and go to door, 2. collect coal, 3. collect potato. In the next iteration, agents perform a random walk with both primitive actions and the acquired options, update the MDP graph, compute new options, and so on. |