Toward Efficient Gradient-Based Value Estimation
Authors: Arsalan Sharifnassab, Richard S. Sutton
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on a few classic control environments with neural network function approximation show significant improvement over RG, and achieving competitive performance to TD. |
| Researcher Affiliation | Academia | 1Authors are with the Department of Computing Science, University of Alberta, Canada. |
| Pseudocode | Yes | Algorithm 1 RAN |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the methodology. |
| Open Datasets | Yes | We ran an experiment on classic control tasks Acrobot and Cartpole to test the performance of the RANS algorithm. ... In another experiment, we evaluated the performance of RANS on simple Mu Jo Co environments Hopper and Half Cheetah. |
| Dataset Splits | No | The paper describes an online learning setting where samples are directly fed to training algorithms, rather than using traditional predefined validation splits. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'neural network', but does not specify any software libraries or dependencies with version numbers (e.g., TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | The parameters used in the experiments are as follows. For RAN, we set α = 0.025, β = 0.4, and λ = 0.9998. For RG and TD(0) we used α = 0.5. ... For TD(0), we used softmax coefficient 1 and Adam optimizer with step-size 0.005. ... For RANS, we set α = 0.001 and all other parameters were set to their default values described in Algorithm 4. |