reproducibilityindex.ai

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Authors: Ying Fan, Jingling Li, Adith Swaminathan, Aditya Modi, Ching-An Cheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results also showcase the effectiveness of CODA, which outperforms other baseline methods across various context-goal relationships of CGO problem. This approach offers a promising direction to solving CGO problems using offline datasets.
Researcher Affiliation	Collaboration	Ying Fan1, Jingling Li2, Adith Swaminathan3, Aditya Modi3, Ching-An Cheng3 1University of Wisconsin-Madison 2Byte Dance Research 3Microsoft Research
Pseudocode	Yes	Algorithm 1 CODA for CGO
Open Source Code	Yes	Code is publicly available at: https://github.com/yingfan-bot/coda.
Open Datasets	Yes	For all experiments, we use the original Ant Maze-v2 datasets (3 different mazes and 6 offline datasets) of D4RL [6] as dynamics datasets Ddyn, removing all rewards and terminals.
Dataset Splits	No	The paper mentions using 'offline datasets' and sampling 'training contexts' and 'test contexts', but it does not specify explicit training/validation/test dataset splits with percentages, absolute counts, or references to predefined splits for reproduction.
Hardware Specification	Yes	For all methods, each training run takes about 8h on a NVIDIA T4 GPU.
Software Dependencies	No	The paper specifies the use of IQL as the backbone algorithm and lists its hyperparameters, but it does not provide specific version numbers for software dependencies like IQL, Python, PyTorch, or CUDA.
Experiment Setup	Yes	For controlled experiments, we use IQL [18] as the same backbone offline algorithm for all the methods with the same set of hyperparameters. ... For IQL, we keep the hyperparameter of γ = 0.99, τ = 0.9, β = 10.0, and α = 0.005 in [18], and tune other hyperparameters on the antmaze-medium-play-v2 environment and choose batch size = 1024 from candidate choices {256, 512, 1024, 2046}, learning rate = 10 4 from candidate choices {5 10 5, 10 4, 3 10 4} and 3 layer MLP with Ru LU activating and 256 hidden units for all networks.