How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
Authors: Ying Fan, Jingling Li, Adith Swaminathan, Aditya Modi, Ching-An Cheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results also showcase the effectiveness of CODA, which outperforms other baseline methods across various context-goal relationships of CGO problem. This approach offers a promising direction to solving CGO problems using offline datasets. |
| Researcher Affiliation | Collaboration | Ying Fan1, Jingling Li2, Adith Swaminathan3, Aditya Modi3, Ching-An Cheng3 1University of Wisconsin-Madison 2Byte Dance Research 3Microsoft Research |
| Pseudocode | Yes | Algorithm 1 CODA for CGO |
| Open Source Code | Yes | Code is publicly available at: https://github.com/yingfan-bot/coda. |
| Open Datasets | Yes | For all experiments, we use the original Ant Maze-v2 datasets (3 different mazes and 6 offline datasets) of D4RL [6] as dynamics datasets Ddyn, removing all rewards and terminals. |
| Dataset Splits | No | The paper mentions using 'offline datasets' and sampling 'training contexts' and 'test contexts', but it does not specify explicit training/validation/test dataset splits with percentages, absolute counts, or references to predefined splits for reproduction. |
| Hardware Specification | Yes | For all methods, each training run takes about 8h on a NVIDIA T4 GPU. |
| Software Dependencies | No | The paper specifies the use of IQL as the backbone algorithm and lists its hyperparameters, but it does not provide specific version numbers for software dependencies like IQL, Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For controlled experiments, we use IQL [18] as the same backbone offline algorithm for all the methods with the same set of hyperparameters. ... For IQL, we keep the hyperparameter of γ = 0.99, τ = 0.9, β = 10.0, and α = 0.005 in [18], and tune other hyperparameters on the antmaze-medium-play-v2 environment and choose batch size = 1024 from candidate choices {256, 512, 1024, 2046}, learning rate = 10 4 from candidate choices {5 10 5, 10 4, 3 10 4} and 3 layer MLP with Ru LU activating and 256 hidden units for all networks. |