Influence-Augmented Online Planning for Complex Environments
Authors: Jinke He, Miguel Suau de Castro, Frans Oliehoek
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment. We perform online planning experiments with the POMCP planner (Silver and Veness, 2010) |
| Researcher Affiliation | Academia | Jinke He Department of Intelligent Systems Delft University of Technology J.He-4@tudelft.nl Miguel Suau Department of Intelligent Systems Delft University of Technology M.Suaude Castro@tudelft.nl Frans A. Oliehoek Department of Intelligent Systems Delft University of Technology F.A.Oliehoek@tudelft.nl |
| Pseudocode | Yes | Algorithm 1: Influence-Augmented Online Planning |
| Open Source Code | Yes | Our codebase was implemented in C++, including a POMCP planner and several benchmarking domains available at https://github.com/INFLUENCEorg/IAOP |
| Open Datasets | No | The paper describes creating datasets by sampling from a global simulator ('To obtain an approximate influence predictor ˆIθ, we sample a dataset D of 1000 episodes from the global simulator Gglobal'), but does not provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions training an RNN ('train a variant of RNN called Gated Recurrent Units (GRU) on D until convergence') but does not provide specific details on dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper states 'We ran each of our experiments for many times on a computer cluster with the same amount of computational resources' but does not provide specific hardware details such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper mentions 'Our codebase was implemented in C++' and training a 'Gated Recurrent Units (GRU)' but does not provide specific version numbers for any software libraries, frameworks, or compilers used. |
| Experiment Setup | Yes | We perform planning with different simulators in games of {5, 9, 17, 33, 65, 129} agents for a horizon of 10 steps, where a fixed number of 1000 Monte Carlo simulations are performed per step. To obtain an approximate influence predictor ˆIθ, we sample a dataset D of 1000 episodes from the global simulator Gglobal with a uniform random policy and train a variant of RNN called Gated Recurrent Units (GRU) (Cho et al., 2014) on D until convergence. The traffic light in the center is controlled by planning, with the goal to minimize the total number of vehicles in this intersection for a horizon of 30 steps. We train an influence predictor with a RNN and evaluate the performance of all three simulators Grandom IALM , Gθ IALM and Gglobal in settings where the allowed planning time is fixed per step. |