reproducibilityindex.ai

Multi-Reward Best Policy Identification

Authors: Alessio Russo, Filippo Vannella

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of MR-Na S on different hard-exploration tabular environments, comparing to RF-UCRL [22] (a reward-free exploration method), ID3AL [33] (a maximum entropy exploration approach) and MR-PSRL, a multi-reward adaptation of PSRL [35]. Results demonstrate the efficiency of MR-Na S in identifying optimal policies across various rewards and in generalizing to unseen rewards when the reward set is sufficiently diverse.
Researcher Affiliation	Industry	Alessio Russo Ericsson AB Stockholm, Sweden Filippo Vannella Ericsson Research Stockholm, Sweden
Pseudocode	Yes	Algorithm 1 MR-Na S (Multiple Rewards Navigate and Stop) Require: Confidence δ; exploration terms (α, β); reward vectors R.
Open Source Code	Yes	Code repository: https://github.com/rssalessio/Multi-Reward-Best-Policy-Identification
Open Datasets	Yes	We evaluate the performance of MR-Na S on different hard-exploration tabular environments: Riverswim [54], Forked Riverswim [53], Double Chain [22] and NArms [54] (an adaptation of Six Arms to N arms). We compare MR-Na S against RF-UCRL [22] (a reward-free exploration method), ID3AL [33] (a maximum entropy exploration approach) and MR-PSRL, a multi-reward adaptation of PSRL [35].
Dataset Splits	No	To assess DBMR-BPI s capacity to generalize on unseen rewards, we uniformly sample 5 additional values of x0 in the same interval that are not used during training, and we denote them by Rrnd.
Hardware Specification	Yes	For these simulations we used 1 G5.4XLARGE AWS instance with 16 v CPUs, 64 Gi B of memory and 1 A10G GPU with 24 Gi B of memory. To obtain all the results 2-3 days are needed. The entire research project needed roughly 15 days of computation time for this experiment.
Software Dependencies	Yes	We set up our experiments using Python 3.11 [88] (for more information, please refer to the following link http://www.python.org), and made use of the following libraries: Num Py [89], Sci Py [90], CVXPY [91], Seaborn [92], Pandas [93], Matplotlib [94]. In CVXPY we used the CLARABEL optimizer [95] and/or the ECOS optimizer [96].
Experiment Setup	Yes	The parameters are listed in tab. 6. Refer to app. E for further details on the parameters and the algorithms.