Actor-Critic Alignment for Offline-to-Online Reinforcement Learning
Authors: Zishun Yu, Xinhua Zhang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that the proposed method improves the performance of the fine-tuned robotic agents on various simulated tasks. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Illinois Chicago, Chicago, IL 60607, USA. Correspondence to: Zishun Yu <zyu32@uic.edu>. |
| Pseudocode | Yes | The pseudo-code of the offline, alignment, and online phases is provided in Algorithm 1, 2, and 3, respectively. |
| Open Source Code | Yes | The implementation of our ACA algorithm can be found at https://github.com/Zishun Yu/ACA. |
| Open Datasets | Yes | We used the Half Cheetah, Hopper, and Walker2d environments from the D4RL-v2 datasets (Fu et al., 2020). |
| Dataset Splits | No | The paper mentions running experiments for a certain number of episodes and mini-batches but does not provide specific dataset split information (e.g., percentages or sample counts) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | Overall, all our implementations are from or based on d3rlpy (Takuma Seno, 2021), a popular RL library that specialized for offline RL. |
| Experiment Setup | Yes | All offline/online experiments ran 5 random seeds. We ran all offline algorithms for 500 episodes with 1000 mini-batches each, and all online experiments for 100 episodes with 1000 environment interactions each. |