reproducibilityindex.ai

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

Authors: Amarildo Likmeta, Matteo Sacco, Alberto Maria Metelli, Marcello Restelli

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we evaluate our algorithm on standard Mujo Co tasks as well as suite of continuous-actions domains, where exploration is crucial, in comparison with state-of-the-art baselines. Additional details and results can be found in the supplementary material with our Arxiv preprint. In this section, we present the empirical evaluation of WAC in various continuous control domains. We start from simple 1D-navigation, where we can better visualize the effects of the Q-posteriors in the learning and exploration process. In Appendix B, we show an evaluation on several standard Mujo Co tasks, which show that this suite of environments does not pose signiﬁcant exploration challenges. Hence, we focus our evaluation of WAC on a set of Mujo Co tasks speciﬁcally designed for exploration. Our results can be reproduced using the source code in https: //github.com/amarildolikmeta/wac explore. Figure 3 shows the results of these experiments. In Figure 4a, we present the average return as a function of the training epochs, whereas in Figure 4b we present the number of episodes completed in 3000 steps of interaction.
Researcher Affiliation	Academia	1University of Bologna 2Politecnico di Milano amarildo.likmeta2@unibo.it, matteo3.sacco@mail.polimi.it, albertomaria.metelli@polimi.it, marcello.restelli@polimi.it
Pseudocode	Yes	Algorithm 1: Wasserstein Actor-Critic.
Open Source Code	Yes	Our results can be reproduced using the source code in https: //github.com/amarildolikmeta/wac explore.
Open Datasets	No	The paper mentions 'standard Mujo Co tasks' and '1D navigation domains' and '2D navigation task' which are environments/benchmarks, not traditional datasets for which public access information (like a direct link, DOI, or formal citation to a specific dataset source) would be provided. The environments are usually built within a simulator like MuJoCo.
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits. It discusses training epochs and evaluation, but not data partitioning specifics typically found in supervised learning contexts.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance details).
Software Dependencies	No	The paper does not list specific versions of software dependencies (e.g., Python, PyTorch, TensorFlow, specific libraries).
Experiment Setup	Yes	Details on the hyperparameter tuning are in Appendix A.