reproducibilityindex.ai

Influence-Augmented Online Planning for Complex Environments

Authors: Jinke He, Miguel Suau de Castro, Frans Oliehoek

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment. We perform online planning experiments with the POMCP planner (Silver and Veness, 2010)
Researcher Affiliation	Academia	Jinke He Department of Intelligent Systems Delft University of Technology J.He-4@tudelft.nl Miguel Suau Department of Intelligent Systems Delft University of Technology M.Suaude Castro@tudelft.nl Frans A. Oliehoek Department of Intelligent Systems Delft University of Technology F.A.Oliehoek@tudelft.nl
Pseudocode	Yes	Algorithm 1: Inﬂuence-Augmented Online Planning
Open Source Code	Yes	Our codebase was implemented in C++, including a POMCP planner and several benchmarking domains available at https://github.com/INFLUENCEorg/IAOP
Open Datasets	No	The paper describes creating datasets by sampling from a global simulator ('To obtain an approximate inﬂuence predictor ˆIθ, we sample a dataset D of 1000 episodes from the global simulator Gglobal'), but does not provide access information for a publicly available or open dataset.
Dataset Splits	No	The paper mentions training an RNN ('train a variant of RNN called Gated Recurrent Units (GRU) on D until convergence') but does not provide specific details on dataset splits for training, validation, or testing.
Hardware Specification	No	The paper states 'We ran each of our experiments for many times on a computer cluster with the same amount of computational resources' but does not provide specific hardware details such as CPU/GPU models or memory specifications.
Software Dependencies	No	The paper mentions 'Our codebase was implemented in C++' and training a 'Gated Recurrent Units (GRU)' but does not provide specific version numbers for any software libraries, frameworks, or compilers used.
Experiment Setup	Yes	We perform planning with different simulators in games of {5, 9, 17, 33, 65, 129} agents for a horizon of 10 steps, where a ﬁxed number of 1000 Monte Carlo simulations are performed per step. To obtain an approximate inﬂuence predictor ˆIθ, we sample a dataset D of 1000 episodes from the global simulator Gglobal with a uniform random policy and train a variant of RNN called Gated Recurrent Units (GRU) (Cho et al., 2014) on D until convergence. The trafﬁc light in the center is controlled by planning, with the goal to minimize the total number of vehicles in this intersection for a horizon of 30 steps. We train an inﬂuence predictor with a RNN and evaluate the performance of all three simulators Grandom IALM , Gθ IALM and Gglobal in settings where the allowed planning time is ﬁxed per step.