reproducibilityindex.ai

A Kernel Loss for Solving the Bellman Equation

Authors: Yihao Feng, Lihong Li, Qiang Liu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our method (labelled K-loss in all experiments) with several representative baselines in both classic examples and popular benchmark problems, for both policy evaluation and optimization. ... Fig. 1 (b&c) show the learning curves of mean squared error ( V V 2) and weight error ( w w ) of different algorithms over iterations. ... Fig. 2 summarizes the result using a neural network as value function for two metrics: V V 2 2 and BV V 2 2, both evaluated on the training transitions.
Researcher Affiliation	Collaboration	Yihao Feng UT Austin yihao@cs.utexas.edu Lihong Li Google Research lihong@google.com Qiang Liu UT Austin lqiang@cs.utexas.edu
Pseudocode	No	The paper describes various algorithms but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper states, 'We use Trust-PCL (Nachum et al., 2018) framework and the public code for our experiments.' This refers to third-party code and not the authors' own implementation being made open-source. There is no explicit statement or link indicating that the code for K-loss is open-source.
Open Datasets	No	The paper refers to environments like 'Puddle World', 'Cart Pole', 'Mountain Car', and 'Mujoco benchmark', which are standard reinforcement learning environments used for generating data. For the 'Modiﬁed example of Tsitsiklis & Van Roy', the paper states 'we randomly collect 2 000 transition tuples for training'. However, the paper does not provide concrete access information (e.g., link, DOI, specific repository, or formal citation with authors/year) for any pre-existing public datasets used or for the generated data.
Dataset Splits	No	The paper mentions 'training transitions' but does not specify explicit dataset splits (e.g., exact percentages or sample counts for training, validation, and test sets) or refer to citations for predefined splits.
Hardware Specification	No	The acknowledgment section states, 'This work is supported in part by NSF CRII 1830161 and NSF CAREER 1846421. We would like to acknowledge Google Cloud and and Amazon Web Services (AWS) for their support.' While cloud services are mentioned, no specific hardware models (e.g., GPU or CPU types, memory) or instance configurations are provided.
Software Dependencies	No	The paper mentions using TensorFlow in Appendix B.1 and an Adam optimizer in Appendix B.2, but does not provide specific version numbers for these or any other software dependencies (e.g., 'TensorFlow 2.x', 'Python 3.x').
Experiment Setup	Yes	For the Puddle World experiment: 'We use an Adam optimizer (Kingma & Ba, 2014) with a learning rate of 10−3. The batch size is 128. For the Gaussian RBF kernel, we select a bandwidth of h = 100 based on validation set.' And also 'The critic and actor network use two hidden layers with 256 units and ReLU activation functions. We use Adam optimizer with learning rate 1e-4 for the actor and 5e-5 for the critic. The batch size is 128.' (Appendix B.3.1)