State-Conditioned Adversarial Subgoal Generation
Authors: Vivienne Huiling Wang, Joni Pajarinen, Tinghuai Wang, Joni-Kristian Kämäräinen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comparison to state-of-the-art algorithms shows that our approach improves both learning efficiency and performance in challenging continuous control tasks. ... This section evaluates and compares our method against standard RL and prior HRL methods in challenging environments ... Our experiments are designed to answer the following questions: ... |
| Researcher Affiliation | Collaboration | Vivienne Huiling Wang1,2, Joni Pajarinen2, Tinghuai Wang3, Joni-Kristian Kämäräinen1 1 Computing Sciences, Tampere University, Finland 2 Department of Electrical Engineering and Automation, Aalto University, Finland 3 Huawei Helsinki Research Center, Finland |
| Pseudocode | No | The paper does not include a clearly labeled pseudocode block or an algorithm block. The methods are described through mathematical formulations and textual explanations. |
| Open Source Code | No | The paper provides links to the official implementations of HRAC and LESSON (baselines) in footnotes, e.g., 'We use HRAC s official implementation https://github.com/ trzhang0116/HRAC'. However, it does not provide any concrete access information (link, statement of release, or supplementary material) for the source code of its own proposed method, SAGA. |
| Open Datasets | Yes | We consider the following five environments for our analysis: 1. Ant Maze: ... 2. Ant Maze Sparse: ... 3. Ant Gather: ... 4. Ant Push: ... 5. Ant Fall: ... |
| Dataset Splits | No | The paper describes an reinforcement learning setup where agents interact with environments. It does not specify explicit training, validation, and test dataset splits in terms of percentages or sample counts as would be common in supervised learning tasks with static datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions algorithms used, such as 'TD3' and 'Adam optimizer', but it does not specify concrete software dependencies with version numbers (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow with their versions, or other libraries). |
| Experiment Setup | Yes | Specifically, we adopt two networks comprising three fully-connected layers with Re LU nonlinearities as the actor and critic networks of both low-level and high-level TD3 networks. The size of the hidden layers of both actor and critic is 300. The output of the high-level actor is activated using the tanh function and scaled according to the size of the environments. The subgoal generator network has the identical architecture as the high-level actor. For the subgoal discriminator network, we use a network consisting of 3 fully-connected layers (size of 300, 300 and 1 respectively) with Leaky-Re LU (negative slope 0.2) nonlinearities and sigmoid function in all tasks. Adam optimizer is used for all networks. ... We empirically study the effect of different coefficents of adversarial loss αadv. Fig. 4 shows that SAGA with three coefficents of adversarial loss 0.0005, 0.001 and 0.0015 shows asymptotically similar results and generally αadv = 0.001 gives better performance across three tasks; we use αadv = 0.001 for all the tasks presented in the paper. |