reproducibilityindex.ai

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning

Authors: Tongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we conduct thorough analyses on a discretized Mountain Car environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.
Researcher Affiliation	Collaboration	1MIT 2UT Austin 3Meta AI. Correspondence to: Tongzhou Wang <tongzhou@mit.edu>.
Pseudocode	No	The paper presents equations for its objective function (e.g., Equation 12) but does not include any clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code: github.com/quasimetric-learning/quasimetric-rl
Open Datasets	Yes	On offline maze2d tasks, QRL performs well in singlegoal and multi-goal evaluations, improving > 37% over the best baseline and > 46% over the d4rl handcoded reference controller (Fu et al., 2020). ... we use the Fetch robot environments from the GCRL benchmark (Plappert et al., 2018).
Dataset Splits	No	The paper does not provide specific details on train/validation/test dataset splits (e.g., percentages, sample counts, or explicit methodology for splitting) beyond mentioning the datasets used for training and evaluation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2014)' as an optimizer but does not provide specific version numbers for other key software components like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA.
Experiment Setup	Yes	All our results are aggregation from 5 runs with different seeds. QRL. Across all experiments, we use ϵ = 0.25, initialize Lagrange multiplier λ = 0.01, and use Adam (Kingma & Ba, 2014) to optimize all parameters. ... Our learning rates are 0.01 for λ, 1 10 4 for the model parameters, and 3 10 5 for the policy parameters. We use a batch size of 256 in training. We prefill the replay buffer with 200 episodes from a random actor, and then iteratively perform (1) generating 10 rollouts and (2) optimizing QRL objective for 500 gradients steps. We use N(0, 0.32)-perturbed action noise in exploration.