A General Framework for Sequential Decision-Making under Adaptivity Constraints
Authors: Nuoya Xiong, Zhaoran Wang, Zhuoran Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimented in the linear mixture MDP with the same setting as Appendix H in (Chen et al., 2022). We compare our ℓ2-EC-RS algorithm with OPERA (Chen et al., 2022), optimal policy and the random policy. The cumulative reward curves show that our algorithm converges to the optimal value slightly slower than OPERA. However, the average number of strategy transitions and calls to the optimization tool decreases from 2000 to 92.8 times over 10 simulations, decreasing the average execution time from 321.6 seconds to 15.9 seconds. We execute two algorithms on three different tasks Hopper-v3 , Half Cheetah-v2 and Walker2d-v3 for 100000 episodes, and the setting is the same as Section 7 of (Liu et al., 2024). The comparisons of the rewards and the number of policy switches are shown in Figure 2 and Table 2. |
| Researcher Affiliation | Academia | 1IIIS, Tsinghua University, China 2Department of Industrial Engineering and Management Sciences, Northwestern University, USA 3Department of Statistics and Data Science, Yale University, USA. |
| Pseudocode | Yes | Algorithm 1 ℓ2-EC-RS; Algorithm 2 ℓ2-EC-Batch; Algorithm 3 Modified ℓ1 ABC-Rare switch; Algorithm 4 ℓ2-EC-Adaptive Batch |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for their proposed methods is open-source or publicly available. |
| Open Datasets | Yes | We experimented in the linear mixture MDP with the same setting as Appendix H in (Chen et al., 2022). We execute two algorithms on three different tasks Hopper-v3 , Half Cheetah-v2 and Walker2d-v3 for 100000 episodes, and the setting is the same as Section 7 of (Liu et al., 2024). |
| Dataset Splits | No | The paper mentions total episodes (K or T) and some experimental parameters but does not provide specific percentages or counts for training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware components (e.g., GPU models, CPU types, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We choose T = 2000 and β = 0.3 log T in the experiment. We execute two algorithms on three different tasks Hopper-v3 , Half Cheetah-v2 and Walker2d-v3 for 100000 episodes, and the setting is the same as Section 7 of (Liu et al., 2024). |