reproducibilityindex.ai

Zero-Shot Transfer with Deictic Object-Oriented Representation in Reinforcement Learning

Authors: Ofir Marom, Benjamin Rosman

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted two sets of experiments on this domain. In the ﬁrst set we have one destination and we ﬁx the number of passengers, n. We generate a grounded MDP with an initial state by randomly sampling n passenger locations and one destination location from one of six pre-speciﬁed locations and we also sample a random taxi start location together with one of four wall conﬁgurations as shown in Figure 1a. We apply 20 independent runs of the following procedure: we sample 10 test MDPs with random initial states. We then randomly sample a training MDP and run DOORMAXD on it for one episode until we reach the terminal state.
Researcher Affiliation	Academia	1University of the Witwatersrand, Johannesburg, South Africa 2Council for Scientiﬁc and Industrial Research, Pretoria, South Africa
Pseudocode	Yes	Algorithm 1: DOORMAXD: learning procedure for C.α and a.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	No	The paper describes using the 'all-passenger any-destination Taxi domain' and the 'Sokoban domain', and generating instances from these. However, it does not provide concrete access information (link, DOI, specific repository, or formal citation with authors/year for specific dataset instances) to make these generated datasets publicly available or reproducible by others without re-implementing the generation process.
Dataset Splits	No	The paper mentions sampling '10 test MDPs' and 'a training MDP' but does not specify a separate validation split, nor does it provide percentages or exact counts for any validation set.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory specifications).
Software Dependencies	No	The paper does not list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We apply 20 independent runs of the following procedure: we sample 10 test MDPs with random initial states. We then randomly sample a training MDP and run DOORMAXD on it for one episode until we reach the terminal state. Upon termination, we test performance by running DOORMAXD for one episode on each of the 10 test MDPs, stopping an episode early if we exceed 500 steps. We repeat this for 100 training MDPs. Since all the MDPs come from the same schema we can share transition dynamics between our MDPs but we only update the transition dynamics on training MDPs. In our experiments we start with n = 1 passenger and increase to n = 4 passengers. We run our experiments for Propositional OO-MDPs and two versions of Deictic OO-MDPs.