Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
Authors: Tian Tian, Kenny Young, Richard S. Sutton
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also empirically demonstrate DAVI s effectiveness in several experiments. [...] 6 Experiments [...] Figure 1 and Figure 2 show the performance of the algorithms. |
| Researcher Affiliation | Academia | Tian Tian Kenny Young Richard S. Sutton University of Alberta and Alberta Machine Intelligence Institute Edmonton, Alberta, Canada {ttian, kjyoung, rsutton}@ualberta.ca |
| Pseudocode | Yes | Algorithm 1: DAVI(m, p, q, τ) Input: State sampling distribution p (S) Input: A potentially state conditional distribution over the sets of actions of size m denoted by q Input: Number of iterations τ, see Corollary 1 for how to choose τ to obtain an ϵ-optimal policy with high probability |
| Open Source Code | No | The checklist question 3(a) explicitly states: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]' |
| Open Datasets | No | The paper defines the structure of the MDPs used for experiments (e.g., 'single-state MDP with 10000 actions', 'tree with a depth of 2', 'random MDP with 100 states'), but it does not provide access information (URL, DOI, repository, or citation) for publicly available datasets. |
| Dataset Splits | No | The paper describes its experimental setup including different MDP structures and running each instance 200 times, but it does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The checklist question 3(d) explicitly states: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]' |
| Software Dependencies | No | The paper describes the algorithms implemented ('VI, Asynchronous VI, and DAVI') and their sampling methods, but it does not specify any particular software, libraries, or their version numbers used in the implementation. |
| Experiment Setup | Yes | DAVI with m = 1 was significantly different from that of DAVI with m = 10, 100, 1000, and DAVI with m = 10, 100, 1000 converged at a similar rate. [...] This experiment consists of a single-state MDP with 10000 actions, all terminate immediately. [...] The first set consists of a tree with a depth of 2. Each state has 50 actions, where each action leads to 2 other distinct next states. [...] The second set consists of a random MDP with 100 states, where each state has 1000 actions. [...] The γ in all of the MDPs are 1. |