How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Authors: Ying Fan, Jingling Li, Adith Swaminathan, Aditya Modi, Ching-An Cheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results also showcase the effectiveness of CODA, which outperforms other baseline methods across various context-goal relationships of CGO problem. This approach offers a promising direction to solving CGO problems using offline datasets.
Researcher Affiliation Collaboration Ying Fan1, Jingling Li2, Adith Swaminathan3, Aditya Modi3, Ching-An Cheng3 1University of Wisconsin-Madison 2Byte Dance Research 3Microsoft Research
Pseudocode Yes Algorithm 1 CODA for CGO
Open Source Code Yes Code is publicly available at: https://github.com/yingfan-bot/coda.
Open Datasets Yes For all experiments, we use the original Ant Maze-v2 datasets (3 different mazes and 6 offline datasets) of D4RL [6] as dynamics datasets Ddyn, removing all rewards and terminals.
Dataset Splits No The paper mentions using 'offline datasets' and sampling 'training contexts' and 'test contexts', but it does not specify explicit training/validation/test dataset splits with percentages, absolute counts, or references to predefined splits for reproduction.
Hardware Specification Yes For all methods, each training run takes about 8h on a NVIDIA T4 GPU.
Software Dependencies No The paper specifies the use of IQL as the backbone algorithm and lists its hyperparameters, but it does not provide specific version numbers for software dependencies like IQL, Python, PyTorch, or CUDA.
Experiment Setup Yes For controlled experiments, we use IQL [18] as the same backbone offline algorithm for all the methods with the same set of hyperparameters. ... For IQL, we keep the hyperparameter of γ = 0.99, τ = 0.9, β = 10.0, and α = 0.005 in [18], and tune other hyperparameters on the antmaze-medium-play-v2 environment and choose batch size = 1024 from candidate choices {256, 512, 1024, 2046}, learning rate = 10 4 from candidate choices {5 10 5, 10 4, 3 10 4} and 3 layer MLP with Ru LU activating and 256 hidden units for all networks.