reproducibilityindex.ai

Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation

Authors: Peide Huang, Mengdi Xu, Jiacheng Zhu, Laixi Shi, Fei Fang, DING ZHAO

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments in locomotion and manipulation tasks and show that our proposed GRADIENT achieves higher performance than baselines in terms of learning efficiency and asymptotic performance. 5 Experiments
Researcher Affiliation	Academia	Peide Huang, Mengdi Xu, Jiacheng Zhu, Laixi Shi, Fei Fang, Ding Zhao Carnegie Mellon University Pittsburgh, PA 15213 {peideh, mengdixu, jzhu4, laixis, feifang, dingzhao}@andrew.cmu.edu
Pseudocode	Yes	Algorithm 1: GRAdual Domain adaptation for curriculum re Inforc Ement lear Ning via optimal Transport (GRADIENT) and Algorithm 2: Compute Barycenter
Open Source Code	Yes	1Code is available under https://github.com/Peide Huang/gradient.git
Open Datasets	Yes	In Fetch Push [58], the objective is to use the gripper to push the box to a goal position. The observation space is a 28-dimension vector, including information about the goal. The context is a 2-dimension vector representing the goal position on a surface. (Reference [58] is 'Openai gym. ar Xiv preprint ar Xiv:1606.01540, 2016.')
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits in the traditional supervised learning sense. Reinforcement learning experiments involve agents interacting with environments, where 'data' is generated through this interaction rather than pre-defined splits. The paper defines 'source task distribution' and 'target task distribution' for generating curricula and evaluates performance on the target task.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies	No	For the learner, we use the SAC [52] and PPO [53] implementations provided in the Stable Baseline3 library [54]. For the optimal tranport computation, we use POT [55]. (No explicit version numbers for Stable Baseline3 or POT are provided in the text).
Experiment Setup	Yes	Input: Source task distribution µ(c), target task distribution ν(c), interpolation factor α, distance metric d, reward threshold G, maximum number of stages K. We then generate curricula using GRADIENT with α = 0.2, 0.1, 0.05.