Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration

Authors: Seungyul Han, Youngchul Sung

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the proposed DAC algorithm on various continuous-action control tasks and provide ablation study. We first consider the pure exploration performance and then the performance on challenging sparse-reward or delayed Mujoco tasks.
Researcher Affiliation Academia 1Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
Pseudocode Yes Algorithm 1 Diversity Actor Critic
Open Source Code Yes The source code of DAC based on Python Tensorflow is available at http://github. com/seungyulhan/dac/.
Open Datasets Yes Mujoco (Todorov et al., 2012) in Open AI Gym (Brockman et al., 2016) and The maze environment was designed by modifying a continuous grid map available at https://github.com/huyaoyu/Grid Map.
Dataset Splits No The paper discusses evaluation methods such as "deterministic evaluation" and averaging over random seeds, but it does not provide explicit details on train/validation/test dataset splits, such as percentages or sample counts.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or cloud computing instances.
Software Dependencies No The paper mentions "Python Tensorflow" for the source code, but it does not provide specific version numbers for these software components or any other libraries.
Experiment Setup Yes For DAC, we use a single learning rate for all networks as 3e-4. We use two Q-functions and a value function. The Q-network and value network has two hidden layers with 256 units and ReLU activation. The policy network has two hidden layers with 256 units and ReLU activation and tanh output layer. The ratio network has two hidden layers with 256 units and ReLU activation and sigmoid output layer.