Deep Reinforcement Learning in Parameterized Action Space
Authors: [code] [data] Matthew Hausknecht, Peter Stone
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 6 covers experiments and results. Additionally, in Section 5 it states: "We evaluate the zeroing, squashing, and inverting gradient approaches in the parameterized HFO domain on the task of approaching the ball and scoring a goal. For each approach, we independently train two agents. All agents are trained for 3 million iterations, approximately 20,000 episodes of play. Training each agent took three days on a NVidia Titan-X GPU." |
| Researcher Affiliation | Academia | The authors' affiliations are listed as: "Matthew Hausknecht Department of Computer Science University of Texas at Austin mhauskn@cs.utexas.edu" and "Peter Stone Department of Computer Science University of Texas at Austin pstone@cs.utexas.edu", indicating an academic affiliation only. |
| Pseudocode | No | The paper describes algorithmic updates (e.g., "Updates to the critic network are largely unchanged from the standard temporal difference update..."), but it does not include explicitly labeled pseudocode blocks or algorithm listings within the text. |
| Open Source Code | Yes | Section 3.2 states: "Complete source code for our agent is available at https://github.com/mhauskn/dqn-hfo and for the HFO domain at https://github.com/mhauskn/HFO/." |
| Open Datasets | Yes | Section 3.2 states: "Complete source code for our agent is available at https://github.com/mhauskn/dqn-hfo and for the HFO domain at https://github.com/mhauskn/HFO/." The HFO domain acts as the environment from which data for training is generated, and its source code is publicly accessible. |
| Dataset Splits | No | The paper describes training agents for "3 million iterations, approximately 20,000 episodes of play" and evaluating them for "100 episodes". However, it does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or predefined citations as typically found in supervised learning setups. |
| Hardware Specification | Yes | Section 5 states: "Training each agent took three days on a NVidia Titan-X GPU." |
| Software Dependencies | No | The paper mentions software components and algorithms such as "ADAM solver" and "Re LU activation function", but it does not provide specific version numbers for any libraries, frameworks, or solvers used (e.g., TensorFlow 2.0, PyTorch 1.9, etc.). |
| Experiment Setup | Yes | Section 3.2 describes the network architecture: "The 58 state inputs are processed by four fully connected layers consisting of 1024-512-256-128 units respectively. Each fully connected layer is followed by a rectified linear (Re LU) activation function with negative slope 10 2. Weights of the fully connected layers use Gaussian initialization with a standard deviation of 10 2. We use the ADAM solver with both actor and critic learning rate set to 10 3. Target networks track the actor and critic using a τ = 10 4." Section 4.1 also details exploration: "Experimentally, we anneal ϵ from 1.0 to 0.1 over the first 10, 000 updates." |