Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm
Authors: Amir-massoud Farahmand
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper considers the problem of estimating the distribution of returns in reinforcement learning, i.e., distributional RL problem. It presents a new representational framework to maintain the uncertainty of returns and provides mathematical tools to compute it. We show that instead of representing a probability distribution function of returns, one can represent their characteristic function, the Fourier transform of their distribution. ... We analyze CVI and its approximate variant and show how approximation errors affect the quality of the computed CVF. ... This paper is only the first step towards understanding CVFs and their properties. ... Finally, empirically evaluating this approach for return uncertainty representation may lead to better understanding of its strengths and weaknesses. |
| Researcher Affiliation | Academia | Amir-massoud Farahmand Vector Institute & University of Toronto Toronto, Canada farahmand@vectorinstitute.ai |
| Pseudocode | No | The paper describes the Characteristic Value Iteration (CVI) and Approximate CVI (ACVI) procedures iteratively, but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not include any statement or link indicating the provision of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not report on any experiments involving datasets, thus no information about public dataset access for training is provided. |
| Dataset Splits | No | The paper is theoretical and does not report on any experiments or dataset usage, therefore no information regarding training, validation, or test dataset splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not detail any specific software dependencies with version numbers needed to replicate its theoretical contributions. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup, including hyperparameter values or system-level training settings. |