reproducibilityindex.ai

Submodular Reinforcement Learning

Authors: Manish Prajapat, Mojmir Mutny, Melanie Zeilinger, Andreas Krause

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We showcase the versatility of our approach by applying SUBPO to several applications such as biodiversity monitoring, Bayesian experiment design, informative path planning, and coverage maximization. Our results demonstrate sample efficiency, as well as scalability to high-dimensional state-action spaces.
Researcher Affiliation	Academia	Manish Prajapat ETH Zurich Mojmír Mutný ETH Zurich Melanie N. Zeilinger ETH Zurich Andreas Krause ETH Zurich
Pseudocode	Yes	Algorithm 1 Submodular Policy Optimization (SUBPO)
Open Source Code	Yes	Code available at https://github.com/manish-pra/non-additive-RL
Open Datasets	Yes	We simulate a bio-diversity monitoring task, where we aim to cover areas with a high density of gorilla nests with a quadrotor in the Kagwene Gorilla Sanctuary (Fig. 1a). ... Let ρ : V R be the nest density obtained by fitting a smooth rate function (Mutný & Krause, 2021) over Gorilla nest counts (Funwi-gabga & Mateu, 2011). ... For instances where we utilized randomly sampled environments, such as coverage with GP samples, gorilla nest density, or item collection environment, we have included the corresponding environment files in the attached code for easy reference.
Dataset Splits	No	The paper mentions running experiments, epochs, and multiple runs but does not specify clear training, validation, or test dataset splits with percentages or counts.
Hardware Specification	No	This takes roughly 1 hour of training for a single-core CPU. ... This takes roughly 6 hours of training for a single-core CPU. The paper mentions 'single-core CPU' but lacks specific details like model number, manufacturer, or clock speed to enable reproduction.
Software Dependencies	No	We implemented all algorithms in Pytorch and will make the code and the videos public. The paper mentions 'Pytorch' but does not specify a version number or other software dependencies with their versions.
Experiment Setup	Yes	The agent s policy was parameterized by a two-layer multi-layer perceptron, consisting of 64 neurons in each layer. The non-linearity in the network was induced by employing the Rectified Linear Unit (Re LU) activation function. By employing a stochastic policy, the agent generated a categorical distribution over the action set for each state. Subsequently, this distribution was passed through a softmax probability function. We employed a batch size of B = 500 and a low entropy coefficient of α = 0 or 0.005, depending on the specific characteristics of the environment.