Metrics and Continuity in Reinforcement Learning
Authors: Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro8261-8269
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered. |
| Researcher Affiliation | Collaboration | Charline Le Lan, * 1 Marc G. Bellemare, 2 Pablo Samuel Castro2 1University of Oxford, 2Google Research, Brain Team charline.lelan@stats.ox.ac.uk, {bellemare,psc}@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code used to produce all these experiments is open-sourced 2. 2Code available at https://github.com/google-research/google-research/tree/master/rl_metrics_aaai2021 |
| Open Datasets | No | We conduct our experiments on Garnet MDPs, which are a class of randomly generated MDPs (Archibald, Mc Kinnon, and Thomas 1995; Piot, Geist, and Pietquin 2014). Specifically, a Garnet MDP Garnet(n S, n A) is parameterized by two values: the number of states n S and the number of actions n A, and is generated as follows: 1. The branching factor bs,a of each transition Pa s is sampled uniformly from [1 : n S]. 2. bs,a states are picked uniformly randomly from S and assigned a random value in [0, 1]; these values are then normalized to produce a proper distribution Pa s . 3. Each Ra s is sampled uniformly in [0, 1]. |
| Dataset Splits | No | The paper describes subsampling of states for evaluation but does not provide specific training/test/validation dataset splits needed for reproducibility in the traditional sense. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software packages such as NumPy, TensorFlow, SciPy, Matplotlib, and Gin-Config, but does not provide specific version numbers for them. |
| Experiment Setup | Yes | We conduct our experiments on Garnet MDPs... Averaged over 100 Garnet MDPs with 200 states and 5 actions, with 50 independent runs for each... For each metric, we perform 10 different aggregations using a k-median algorithm, ranging from one aggregate state to 200 aggregate states.Specifically, given a subsampling fraction f [0, 1], we sample |S| f states and call this set κ. |