Metrics and Continuity in Reinforcement Learning

Authors: Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro8261-8269

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.
Researcher Affiliation Collaboration Charline Le Lan, * 1 Marc G. Bellemare, 2 Pablo Samuel Castro2 1University of Oxford, 2Google Research, Brain Team charline.lelan@stats.ox.ac.uk, {bellemare,psc}@google.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code used to produce all these experiments is open-sourced 2. 2Code available at https://github.com/google-research/google-research/tree/master/rl_metrics_aaai2021
Open Datasets No We conduct our experiments on Garnet MDPs, which are a class of randomly generated MDPs (Archibald, Mc Kinnon, and Thomas 1995; Piot, Geist, and Pietquin 2014). Specifically, a Garnet MDP Garnet(n S, n A) is parameterized by two values: the number of states n S and the number of actions n A, and is generated as follows: 1. The branching factor bs,a of each transition Pa s is sampled uniformly from [1 : n S]. 2. bs,a states are picked uniformly randomly from S and assigned a random value in [0, 1]; these values are then normalized to produce a proper distribution Pa s . 3. Each Ra s is sampled uniformly in [0, 1].
Dataset Splits No The paper describes subsampling of states for evaluation but does not provide specific training/test/validation dataset splits needed for reproducibility in the traditional sense.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software packages such as NumPy, TensorFlow, SciPy, Matplotlib, and Gin-Config, but does not provide specific version numbers for them.
Experiment Setup Yes We conduct our experiments on Garnet MDPs... Averaged over 100 Garnet MDPs with 200 states and 5 actions, with 50 independent runs for each... For each metric, we perform 10 different aggregations using a k-median algorithm, ranging from one aggregate state to 200 aggregate states.Specifically, given a subsampling fraction f [0, 1], we sample |S| f states and call this set κ.