reproducibilityindex.ai

Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration

Authors: Seungyul Han, Youngchul Sung

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the proposed DAC algorithm on various continuous-action control tasks and provide ablation study. We ﬁrst consider the pure exploration performance and then the performance on challenging sparse-reward or delayed Mujoco tasks.
Researcher Affiliation	Academia	1Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
Pseudocode	Yes	Algorithm 1 Diversity Actor Critic
Open Source Code	Yes	The source code of DAC based on Python Tensorﬂow is available at http://github. com/seungyulhan/dac/.
Open Datasets	Yes	Mujoco (Todorov et al., 2012) in Open AI Gym (Brockman et al., 2016) and The maze environment was designed by modifying a continuous grid map available at https://github.com/huyaoyu/Grid Map.
Dataset Splits	No	The paper discusses evaluation methods such as "deterministic evaluation" and averaging over random seeds, but it does not provide explicit details on train/validation/test dataset splits, such as percentages or sample counts.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or cloud computing instances.
Software Dependencies	No	The paper mentions "Python Tensorﬂow" for the source code, but it does not provide specific version numbers for these software components or any other libraries.
Experiment Setup	Yes	For DAC, we use a single learning rate for all networks as 3e-4. We use two Q-functions and a value function. The Q-network and value network has two hidden layers with 256 units and ReLU activation. The policy network has two hidden layers with 256 units and ReLU activation and tanh output layer. The ratio network has two hidden layers with 256 units and ReLU activation and sigmoid output layer.