Influence-Augmented Online Planning for Complex Environments

Authors: Jinke He, Miguel Suau de Castro, Frans Oliehoek

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment. We perform online planning experiments with the POMCP planner (Silver and Veness, 2010)
Researcher Affiliation Academia Jinke He Department of Intelligent Systems Delft University of Technology J.He-4@tudelft.nl Miguel Suau Department of Intelligent Systems Delft University of Technology M.Suaude Castro@tudelft.nl Frans A. Oliehoek Department of Intelligent Systems Delft University of Technology F.A.Oliehoek@tudelft.nl
Pseudocode Yes Algorithm 1: Influence-Augmented Online Planning
Open Source Code Yes Our codebase was implemented in C++, including a POMCP planner and several benchmarking domains available at https://github.com/INFLUENCEorg/IAOP
Open Datasets No The paper describes creating datasets by sampling from a global simulator ('To obtain an approximate influence predictor ˆIθ, we sample a dataset D of 1000 episodes from the global simulator Gglobal'), but does not provide access information for a publicly available or open dataset.
Dataset Splits No The paper mentions training an RNN ('train a variant of RNN called Gated Recurrent Units (GRU) on D until convergence') but does not provide specific details on dataset splits for training, validation, or testing.
Hardware Specification No The paper states 'We ran each of our experiments for many times on a computer cluster with the same amount of computational resources' but does not provide specific hardware details such as CPU/GPU models or memory specifications.
Software Dependencies No The paper mentions 'Our codebase was implemented in C++' and training a 'Gated Recurrent Units (GRU)' but does not provide specific version numbers for any software libraries, frameworks, or compilers used.
Experiment Setup Yes We perform planning with different simulators in games of {5, 9, 17, 33, 65, 129} agents for a horizon of 10 steps, where a fixed number of 1000 Monte Carlo simulations are performed per step. To obtain an approximate influence predictor ˆIθ, we sample a dataset D of 1000 episodes from the global simulator Gglobal with a uniform random policy and train a variant of RNN called Gated Recurrent Units (GRU) (Cho et al., 2014) on D until convergence. The traffic light in the center is controlled by planning, with the goal to minimize the total number of vehicles in this intersection for a horizon of 30 steps. We train an influence predictor with a RNN and evaluate the performance of all three simulators Grandom IALM , Gθ IALM and Gglobal in settings where the allowed planning time is fixed per step.