A Kernel Loss for Solving the Bellman Equation
Authors: Yihao Feng, Lihong Li, Qiang Liu
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our method (labelled K-loss in all experiments) with several representative baselines in both classic examples and popular benchmark problems, for both policy evaluation and optimization. ... Fig. 1 (b&c) show the learning curves of mean squared error ( V V 2) and weight error ( w w ) of different algorithms over iterations. ... Fig. 2 summarizes the result using a neural network as value function for two metrics: V V 2 2 and BV V 2 2, both evaluated on the training transitions. |
| Researcher Affiliation | Collaboration | Yihao Feng UT Austin yihao@cs.utexas.edu Lihong Li Google Research lihong@google.com Qiang Liu UT Austin lqiang@cs.utexas.edu |
| Pseudocode | No | The paper describes various algorithms but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, 'We use Trust-PCL (Nachum et al., 2018) framework and the public code for our experiments.' This refers to third-party code and not the authors' own implementation being made open-source. There is no explicit statement or link indicating that the code for K-loss is open-source. |
| Open Datasets | No | The paper refers to environments like 'Puddle World', 'Cart Pole', 'Mountain Car', and 'Mujoco benchmark', which are standard reinforcement learning environments used for generating data. For the 'Modified example of Tsitsiklis & Van Roy', the paper states 'we randomly collect 2 000 transition tuples for training'. However, the paper does not provide concrete access information (e.g., link, DOI, specific repository, or formal citation with authors/year) for any pre-existing public datasets used or for the generated data. |
| Dataset Splits | No | The paper mentions 'training transitions' but does not specify explicit dataset splits (e.g., exact percentages or sample counts for training, validation, and test sets) or refer to citations for predefined splits. |
| Hardware Specification | No | The acknowledgment section states, 'This work is supported in part by NSF CRII 1830161 and NSF CAREER 1846421. We would like to acknowledge Google Cloud and and Amazon Web Services (AWS) for their support.' While cloud services are mentioned, no specific hardware models (e.g., GPU or CPU types, memory) or instance configurations are provided. |
| Software Dependencies | No | The paper mentions using TensorFlow in Appendix B.1 and an Adam optimizer in Appendix B.2, but does not provide specific version numbers for these or any other software dependencies (e.g., 'TensorFlow 2.x', 'Python 3.x'). |
| Experiment Setup | Yes | For the Puddle World experiment: 'We use an Adam optimizer (Kingma & Ba, 2014) with a learning rate of 10−3. The batch size is 128. For the Gaussian RBF kernel, we select a bandwidth of h = 100 based on validation set.' And also 'The critic and actor network use two hidden layers with 256 units and ReLU activation functions. We use Adam optimizer with learning rate 1e-4 for the actor and 5e-5 for the critic. The batch size is 128.' (Appendix B.3.1) |