reproducibilityindex.ai

On Representation Complexity of Model-based and Model-free Reinforcement Learning

Authors: Hanlin Zhu, Baihe Huang, Stuart Russell

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically corroborate our theory by comparing the approximation error of the transition kernel, reward function, and optimal Q-function in various Mujoco environments, which demonstrates that the approximation errors of the transition kernel and reward function are consistently lower than those of the optimal Q-function.
Researcher Affiliation	Academia	Hanlin Zhu , Baihe Huang , Stuart Russell EECS, UC Berkeley {hanlinzhu,baihe_huang,russell}@berkeley.edu
Pseudocode	No	The paper includes diagrams and theoretical descriptions of circuits, but it does not present any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/realgourmet/rep_complexity_rl.
Open Datasets	Yes	For common Mu Jo Co Gym environments (Brockman et al., 2016), including Ant-v4, Hopper-v4, Half Cheetah-v4, Inverted Pendulum-v4, and Walker2d-v4
Dataset Splits	No	The paper does not explicitly provide details about specific training, validation, and test splits for the data used in their neural network fitting experiments. It describes hyperparameters for training but not data partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions 'Optimizer Adam (Kingma & Ba, 2014)' and 'Soft-Actor-Critic (Haarnoja et al., 2018)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Table 1: Hyperparameter Value(s): Optimizer Adam (Kingma & Ba, 2014), Learning Rate 0.0003, Batch Size 1000, Number of Epochs 100000, Init_temperature 0.1, Episode length 1000, Discount factor 0.99, number of hidden layers (all networks) 256, number of hidden units per layer 2, target update interval 1. Table 2: Hyperparameter Value(s): Optimizer Adam (Kingma & Ba, 2014), Learning Rate 0.001, Batch Size 32, Number of Epochs 100.