Actor-Critic Alignment for Offline-to-Online Reinforcement Learning

Authors: Zishun Yu, Xinhua Zhang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that the proposed method improves the performance of the fine-tuned robotic agents on various simulated tasks.
Researcher Affiliation Academia 1Department of Computer Science, University of Illinois Chicago, Chicago, IL 60607, USA. Correspondence to: Zishun Yu <zyu32@uic.edu>.
Pseudocode Yes The pseudo-code of the offline, alignment, and online phases is provided in Algorithm 1, 2, and 3, respectively.
Open Source Code Yes The implementation of our ACA algorithm can be found at https://github.com/Zishun Yu/ACA.
Open Datasets Yes We used the Half Cheetah, Hopper, and Walker2d environments from the D4RL-v2 datasets (Fu et al., 2020).
Dataset Splits No The paper mentions running experiments for a certain number of episodes and mini-batches but does not provide specific dataset split information (e.g., percentages or sample counts) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No Overall, all our implementations are from or based on d3rlpy (Takuma Seno, 2021), a popular RL library that specialized for offline RL.
Experiment Setup Yes All offline/online experiments ran 5 random seeds. We ran all offline algorithms for 500 episodes with 1000 mini-batches each, and all online experiments for 100 episodes with 1000 environment interactions each.