ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces

Authors: Zecheng He, Srinivas Sunkara, Xiaoxue Zang, Ying Xu, Lijuan Liu, Nevan Wichers, Gabriel Schubiner, Ruby Lee, Jindong Chen5931-5938

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed model on a wide variety of downstream tasks, ranging from icon classification to UI component retrieval based on its natural language description. Experiments show that the proposed Action Bert model outperforms multi-modal baselines across all downstream tasks by up to 15.5%.
Researcher Affiliation Collaboration Zecheng He 1, Srinivas Sunkara 2, Xiaoxue Zang 2, Ying Xu 2, Lijuan Liu 2, Nevan Wichers 2, Gabriel Schubiner 2, Ruby Lee 1, Jindong Chen 2 1 Princeton University 2 Google Research {zechengh, rblee}@princeton.edu, {srinivasksun, xiaoxuez, yingyingxuxu, lijuanliu, wichersn, gsch, jdchen}@google.com
Pseudocode No The paper describes algorithms in text but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to its source code. It mentions 'Robo app crawler' but this is an external tool.
Open Datasets Yes We use the Rico (Deka et al. 2017) dataset for this task. Rico is the largest public mobile app design dataset, containing 72k unique screenshots with their view hierarchies.
Dataset Splits Yes We split this data in the ratio of 80%:10%:10% to obtain the train, dev and test sets, respectively. We use 43.5k unique app UIs with their view hierarchies and app types, and split them in the ratio 80%, 10%, 10% for training, validation and testing.
Hardware Specification Yes Action Bert is pre-trained with 16 TPUs for three days.
Software Dependencies No The paper mentions software components and models like BERT, ResNet, Faster-RCNN, and Adam optimizer but does not provide specific version numbers for software dependencies like Python or PyTorch.
Experiment Setup Yes We use Adam optimizer (Kingma and Ba 2014) with learning rate r = 10 5, β1 = 0.9, β2 = 0.999, ϵ = 10 7 and batch size = 128 for training. We set λCUI = 0.1 and λmask = 0.01 in Eq. (5) during the pre-training.