Bayesian Experience Reuse for Learning from Multiple Demonstrators

Authors: Mike Gimelfarb, Scott Sanner, Chi-Guhn Lee

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate its effectiveness for minimizing multi-modal functions, and optimizing a high-dimensional supply chain with cost uncertainty, where it is also shown to improve upon the performance of the demonstrators policies. ... 4 Empirical Evaluation In order to demonstrate the effectiveness of BERS, we consider two problems: (1) the search for the minimum of static but high-dimensional multi-modal functions, and (2) the dynamic control of a complex supply chain network with stochastic demand.
Researcher Affiliation Academia Michael Gimelfarb , Scott Sanner and Chi-Guhn Lee Department of Mechanical and Industrial Engineering, University of Toronto mike.gimelfarb@mail.utoronto.ca, ssanner@mie.utoronto.ca, cglee@mie.utoronto.ca Affiliate to Vector Institute, Toronto, Canada.
Pseudocode Yes Algorithm 1 Bayesian Experience Reuse (BERS)
Open Source Code Yes The appendix can be found at https://github.com/mikegimelfarb/bayesian-experience-reuse.
Open Datasets Yes More specifically, we use the 10-dimensional Rosenbrock, Ackley and sphere functions as source tasks, and the Rastrigin function as the target task (please see appendix for definitions and processing).
Dataset Splits No No explicit train/validation/test dataset splits (percentages, counts, or specific predefined splits) are mentioned in the main text.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments are provided in the paper.
Software Dependencies No The paper mentions using "DDPG [Lillicrap et al., 2016]" as the base learning agent but does not provide specific version numbers for DDPG or any other software libraries.
Experiment Setup Yes The search is limited to xi [ 4, 4] for all i = 1, 2 . . . 10. The global minimums of the functions are: x Rosenbrock = 1, x Ackley = 0, x Sphere = 2 and x Rastrigin = 2. ... The factory can manufacture up to 35 units of inventory per day, and the factory and the warehouses can each store up to 50 units of inventory at any given time. ... Demand for each warehouse A, B .. .F, in units per day, is Poisson-distributed with respective means {7, 6, 6, 5, 5, 5}. ... This leads to a 2 + K + K2 = 44-dimensional continuous action space.