Toward Efficient Gradient-Based Value Estimation

Authors: Arsalan Sharifnassab, Richard S. Sutton

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results on a few classic control environments with neural network function approximation show significant improvement over RG, and achieving competitive performance to TD.
Researcher Affiliation Academia 1Authors are with the Department of Computing Science, University of Alberta, Canada.
Pseudocode Yes Algorithm 1 RAN
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the methodology.
Open Datasets Yes We ran an experiment on classic control tasks Acrobot and Cartpole to test the performance of the RANS algorithm. ... In another experiment, we evaluated the performance of RANS on simple Mu Jo Co environments Hopper and Half Cheetah.
Dataset Splits No The paper describes an online learning setting where samples are directly fed to training algorithms, rather than using traditional predefined validation splits.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used to run its experiments.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'neural network', but does not specify any software libraries or dependencies with version numbers (e.g., TensorFlow, PyTorch versions).
Experiment Setup Yes The parameters used in the experiments are as follows. For RAN, we set α = 0.025, β = 0.4, and λ = 0.9998. For RG and TD(0) we used α = 0.5. ... For TD(0), we used softmax coefficient 1 and Adam optimizer with step-size 0.005. ... For RANS, we set α = 0.001 and all other parameters were set to their default values described in Algorithm 4.