Become a Proficient Player with Limited Data through Watching Pure Videos

Authors: Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct the experiments on top of Efficient Zero, which is the current So TA model-based algorithm on the Atari 100k benchmark. Our framework achieves the So TA on the 60-minute Atari games and significantly outperforms others. Experiments show that the model pre-trained on distinct environmental data together can be fine-tuned well to the corresponding environments without re-pre-training.
Researcher Affiliation Academia Weirui Ye123 Yunsheng Zhang23 Pieter Abbeel4 Yang Gao123 Tsinghua University1, Shanghai Artificial Intelligence Laboratory2 Shanghai Qi Zhi Institute3, UC Berkeley4 ywr20@mails.tsinghua.edu.cn, ys-zhang18@tsinghua.org.cn pabbeel@berkeley.edu, gaoyangiiis@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Building Action Adapter
Open Source Code Yes The code is available at https://github.com/Ye WR/FICC.git.
Open Datasets Yes Pretraining Dataset We use the Efficient Zero replay buffer as the pre-training dataset. We train the Efficient Zero for 1M transitions from scratch and save the replay buffer as the pre-training dataset. We conduct the experiments on top of Efficient Zero, which is the current So TA model-based algorithm on the Atari 100k benchmark. In order to further challenge the RL algorithm, we propose to use the Atari 50k benchmark, which only consists of 50k steps or one-hour game-play of interactions.
Dataset Splits No The paper mentions training for 50k steps and evaluation over 100 evaluations across 3 runs, but does not specify explicit training/validation/test splits for a static dataset, nor does it refer to predefined standard splits with citations.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using Efficient Zero as the base model-based RL algorithm, which implies underlying software like Python and deep learning frameworks (e.g., PyTorch, TensorFlow), but it does not specify any software or library names with version numbers.
Experiment Setup Yes Hyper-parameters The hyper-parameters for pre-training are listed in Appendix B. For fine-tuning, we train for 50k steps and follow other hyper-parameter settings in Efficient Zero, which are listed in the Appendix C. We update the action adapter every 1000 transitions. Table 8: Hyper-parameters for FICC pre-training. Parameter Setting Observation down-sampling 96 96 Frames stacked 4 Frames skip 4 Minibatch size 256 Optimizer SGD Optimizer: learning rate 0.02 Optimizer: momentum 0.9 Optimizer: weight decay 10 4 Learning rate schedule cos 0.02 0.0002 Max gradient norm 10 Training steps 50K Pre-training data 1M Unroll steps for training dynamics 5 Shape of the state 64 6 6 VQ: Number of latent action embeddings 20 VQ: Dimension of latent action embeddings 5