Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

Authors: Amarildo Likmeta, Matteo Sacco, Alberto Maria Metelli, Marcello Restelli

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we evaluate our algorithm on standard Mujo Co tasks as well as suite of continuous-actions domains, where exploration is crucial, in comparison with state-of-the-art baselines. Additional details and results can be found in the supplementary material with our Arxiv preprint. In this section, we present the empirical evaluation of WAC in various continuous control domains. We start from simple 1D-navigation, where we can better visualize the effects of the Q-posteriors in the learning and exploration process. In Appendix B, we show an evaluation on several standard Mujo Co tasks, which show that this suite of environments does not pose significant exploration challenges. Hence, we focus our evaluation of WAC on a set of Mujo Co tasks specifically designed for exploration. Our results can be reproduced using the source code in https: //github.com/amarildolikmeta/wac explore. Figure 3 shows the results of these experiments. In Figure 4a, we present the average return as a function of the training epochs, whereas in Figure 4b we present the number of episodes completed in 3000 steps of interaction.
Researcher Affiliation Academia 1University of Bologna 2Politecnico di Milano amarildo.likmeta2@unibo.it, matteo3.sacco@mail.polimi.it, albertomaria.metelli@polimi.it, marcello.restelli@polimi.it
Pseudocode Yes Algorithm 1: Wasserstein Actor-Critic.
Open Source Code Yes Our results can be reproduced using the source code in https: //github.com/amarildolikmeta/wac explore.
Open Datasets No The paper mentions 'standard Mujo Co tasks' and '1D navigation domains' and '2D navigation task' which are environments/benchmarks, not traditional datasets for which public access information (like a direct link, DOI, or formal citation to a specific dataset source) would be provided. The environments are usually built within a simulator like MuJoCo.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. It discusses training epochs and evaluation, but not data partitioning specifics typically found in supervised learning contexts.
Hardware Specification No The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance details).
Software Dependencies No The paper does not list specific versions of software dependencies (e.g., Python, PyTorch, TensorFlow, specific libraries).
Experiment Setup Yes Details on the hyperparameter tuning are in Appendix A.