reproducibilityindex.ai

A General Framework for Sequential Decision-Making under Adaptivity Constraints

Authors: Nuoya Xiong, Zhaoran Wang, Zhuoran Yang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimented in the linear mixture MDP with the same setting as Appendix H in (Chen et al., 2022). We compare our ℓ2-EC-RS algorithm with OPERA (Chen et al., 2022), optimal policy and the random policy. The cumulative reward curves show that our algorithm converges to the optimal value slightly slower than OPERA. However, the average number of strategy transitions and calls to the optimization tool decreases from 2000 to 92.8 times over 10 simulations, decreasing the average execution time from 321.6 seconds to 15.9 seconds. We execute two algorithms on three different tasks Hopper-v3 , Half Cheetah-v2 and Walker2d-v3 for 100000 episodes, and the setting is the same as Section 7 of (Liu et al., 2024). The comparisons of the rewards and the number of policy switches are shown in Figure 2 and Table 2.
Researcher Affiliation	Academia	1IIIS, Tsinghua University, China 2Department of Industrial Engineering and Management Sciences, Northwestern University, USA 3Department of Statistics and Data Science, Yale University, USA.
Pseudocode	Yes	Algorithm 1 ℓ2-EC-RS; Algorithm 2 ℓ2-EC-Batch; Algorithm 3 Modified ℓ1 ABC-Rare switch; Algorithm 4 ℓ2-EC-Adaptive Batch
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for their proposed methods is open-source or publicly available.
Open Datasets	Yes	We experimented in the linear mixture MDP with the same setting as Appendix H in (Chen et al., 2022). We execute two algorithms on three different tasks Hopper-v3 , Half Cheetah-v2 and Walker2d-v3 for 100000 episodes, and the setting is the same as Section 7 of (Liu et al., 2024).
Dataset Splits	No	The paper mentions total episodes (K or T) and some experimental parameters but does not provide specific percentages or counts for training, validation, or test dataset splits.
Hardware Specification	No	The paper does not explicitly describe any specific hardware components (e.g., GPU models, CPU types, memory specifications) used for running its experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	We choose T = 2000 and β = 0.3 log T in the experiment. We execute two algorithms on three different tasks Hopper-v3 , Half Cheetah-v2 and Walker2d-v3 for 100000 episodes, and the setting is the same as Section 7 of (Liu et al., 2024).