Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration
Authors: Seungyul Han, Youngchul Sung
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the proposed DAC algorithm on various continuous-action control tasks and provide ablation study. We first consider the pure exploration performance and then the performance on challenging sparse-reward or delayed Mujoco tasks. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea. |
| Pseudocode | Yes | Algorithm 1 Diversity Actor Critic |
| Open Source Code | Yes | The source code of DAC based on Python Tensorflow is available at http://github. com/seungyulhan/dac/. |
| Open Datasets | Yes | Mujoco (Todorov et al., 2012) in Open AI Gym (Brockman et al., 2016) and The maze environment was designed by modifying a continuous grid map available at https://github.com/huyaoyu/Grid Map. |
| Dataset Splits | No | The paper discusses evaluation methods such as "deterministic evaluation" and averaging over random seeds, but it does not provide explicit details on train/validation/test dataset splits, such as percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or cloud computing instances. |
| Software Dependencies | No | The paper mentions "Python Tensorflow" for the source code, but it does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | For DAC, we use a single learning rate for all networks as 3e-4. We use two Q-functions and a value function. The Q-network and value network has two hidden layers with 256 units and ReLU activation. The policy network has two hidden layers with 256 units and ReLU activation and tanh output layer. The ratio network has two hidden layers with 256 units and ReLU activation and sigmoid output layer. |