Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction

Authors: Riccardo De Santi, Federico Arangath Joseph, Noah Liniger, Mirco Mutti, Andreas Krause

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we perform a thorough experimental evaluation of GAE analysing its statistical and computational efficiency on two tasks where the unknown quantity f represents: (1) the amount of pollutant emerging from a point source (see Fig. 1), and (2) the toxicity of chemical compounds generated from a set of base elements (see Fig. 2f).
Researcher Affiliation Academia 1Department of Computer Science, ETH Zurich, Zurich, Switzerland 2ETH AI Center, Zurich, Switzerland 3Technion, Haifa, Israel.
Pseudocode Yes Algorithm 1 Geometric Active Exploration (GAE)
Open Source Code No The paper does not contain an explicit statement about the release of its source code or a link to a code repository.
Open Datasets No The paper describes the construction of simulated environments and data within the paper (e.g., 'S = 240 states' for pollutant diffusion and 'S = 363 states' for chemical compounds) but does not provide access information (link, DOI, specific citation) for a publicly available dataset.
Dataset Splits No The paper describes simulated environments and data generation processes, but does not specify training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper mentions computational time was measured but does not specify the exact hardware (CPU, GPU models, or specific machine configurations) used for the experiments.
Software Dependencies No The paper mentions 'Python' and a 'standard time library in Python' but does not provide specific version numbers for these or any other key software dependencies (e.g., machine learning frameworks, solvers, or additional libraries).
Experiment Setup Yes The smoothness parameter was chosen to be η = 0.001, and δ = 0.01 for both, deterministic and stochastic dynamics. Furthermore, we found that in practice, a constant number of interactions τk = τ for all the K iterations of GAE works well, especially for remarkably low τ. In this setting, we chose τ = 3... To update the abstract state-action frequency λk+1, we also use a constant update step of 0.005/S. The initial state of the agent was chosen on the outermost circle. n as 210, resulting in K = 70 iterations of GAE. All the experiments were repeated over 15 random seeds.