Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

Authors: Yuxin Pan, Yize Chen, Fangzhen Lin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that AR2L is versatile in the sense that it improves policy robustness while maintaining an acceptable level of performance for the nominal case.
Researcher Affiliation Academia Yuxin Pan1 Yize Chen2 Fangzhen Lin3 1EMIA, The Hong Kong University of Science and Technology 2AI Thrust, The Hong Kong University of Science and Technology (Guangzhou) 3CSE, The Hong Kong University of Science and Technology yuxin.pan@connect.ust.hk yizechen@ust.hk flin@cs.ust.hk
Pseudocode Yes The pseudocodes for both the exact AR2L algorithm and the approximate AR2L algorithm are presented in Algorithm 1 and Algorithm 2, respectively.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a direct link to a code repository for the methodology described.
Open Datasets No The paper mentions creating "discrete and continuous datasets" and "mixture datasets" but does not provide concrete access information (link, DOI, repository, or formal citation with authors/year) for public availability. It states: "Each dataset consists of 3,000 problem instances, where each problem instance contains 150 items." and "During testing, we construct mixture datasets by randomly selecting β% nominal box sequences and reordering them using the learned permutation-based attacker for each packing policy."
Dataset Splits No The paper mentions "discrete and continuous datasets" and creating "mixture datasets" for evaluation, but it does not specify explicit train/validation/test splits (e.g., percentages, sample counts, or references to standard splits) for reproducing the experimental setup. For example, it only says "Each dataset consists of 3,000 problem instances" and describes how 'mixture datasets' are constructed for evaluation, but not how training data was partitioned.
Hardware Specification Yes all the models are developed using Py Torch (Paszke et al., 2017) and trained on a Nvidia RTX 3090 GPU and an Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz.
Software Dependencies No The paper mentions
Experiment Setup Yes The rollout length for each iteration is set to 30, and the learning rate is set to 0.0003. To ensure a fair comparison, we keep the same hyperparameter settings mentioned above for each of the methods that we have reproduced.