ABM: Attention before Manipulation

Authors: Fan Zhuo, Ying He, Fei Yu, Pengteng Li, Zheyi Zhao, Xilong Sun

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method significantly outperforms the baselines in the zero-shot and compositional generalization experiment settings.
Researcher Affiliation Collaboration 1Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) 2College of Computer Science and Software Engineering, Shenzhen University, China
Pseudocode No The paper describes the methodology in text and with diagrams, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No Visual results are provided at: ABM.github.io. This statement specifies visual results, not source code for the methodology.
Open Datasets Yes We utilize RLBench tools to generate training datasets with 100 demonstrations per task as RVT. For ABM data preprocessing, we employ the frozen, pretrained Vi T-L/14@336px CLIP encoder to encode the images from four RGB-D cameras, and obtain patch-level dense features, which will be upscaled to match the resolution of the original images captured by the cameras during training. We train our model on 8 RLBench tasks... [James et al., 2020]. RLBench: The Robot Learning Benchmark & Learning Environment.
Dataset Splits Yes Specifically, there is no overlap between the objects in the training and validation sets, meaning that the objects in the validation set have never been seen by the model during the training process. And Evaluations are scored as 0 for failures or 100 for complete successes, and we report average success rates by evaluating the model five times on the same 25 variations episodes per task in seen , unseen and compositional generalization evaluations.
Hardware Specification Yes We use a batch size of 30 to train our model and baseline methods on 6 NVIDIA RTX 3090 GPUs for 80k iterations with the LAMB optimizer... We assess the realtime performance of our model on an NVIDIA RTX 3090...
Software Dependencies No The paper mentions software tools like Coppela Sim, Py Rep, and Pytorch3D, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We use a batch size of 30 to train our model and baseline methods on 6 NVIDIA RTX 3090 GPUs for 80k iterations with the LAMB optimizer [You et al., 2019] and a learning rate of 0.003. During training, data augmentation involves random translations of point clouds within the range of [ 0.125m, 0.125m, 0.125m], and random rotations around the yaw axis within the range of 45 .