Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm

Authors: Amir-massoud Farahmand

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper considers the problem of estimating the distribution of returns in reinforcement learning, i.e., distributional RL problem. It presents a new representational framework to maintain the uncertainty of returns and provides mathematical tools to compute it. We show that instead of representing a probability distribution function of returns, one can represent their characteristic function, the Fourier transform of their distribution. ... We analyze CVI and its approximate variant and show how approximation errors affect the quality of the computed CVF. ... This paper is only the first step towards understanding CVFs and their properties. ... Finally, empirically evaluating this approach for return uncertainty representation may lead to better understanding of its strengths and weaknesses.
Researcher Affiliation Academia Amir-massoud Farahmand Vector Institute & University of Toronto Toronto, Canada farahmand@vectorinstitute.ai
Pseudocode No The paper describes the Characteristic Value Iteration (CVI) and Approximate CVI (ACVI) procedures iteratively, but does not present them in a structured pseudocode or algorithm block.
Open Source Code No The paper does not include any statement or link indicating the provision of open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not report on any experiments involving datasets, thus no information about public dataset access for training is provided.
Dataset Splits No The paper is theoretical and does not report on any experiments or dataset usage, therefore no information regarding training, validation, or test dataset splits is provided.
Hardware Specification No The paper is theoretical and does not describe any experimental setup or hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not detail any specific software dependencies with version numbers needed to replicate its theoretical contributions.
Experiment Setup No The paper is theoretical and does not describe any experimental setup, including hyperparameter values or system-level training settings.