UIBert: Learning Generic Multimodal Representations for UI Understanding

Authors: Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, Blaise Agüera y Arcas

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.
Researcher Affiliation Collaboration Chongyang Bai1 , Xiaoxue Zang2 , Ying Xu2 , Srinivas Sunkara2 , Abhinav Rastogi2 , Jindong Chen2 and Blaise Ag uera y Arcas2 1Dartmouth College 2Google Research bchy1023@gmail.com, {xiaoxuez,yingyingxuxu,srinivasksun,abhirast,jdchen,blaisea}@google.com
Pseudocode No The paper describes the steps of its pre-training tasks and architecture in text but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code No The paper states 'We release two new datasets extended from Rico [Deka et al., 2017] for two tasks: similar UI component retrieval and referring expression component retrieval.2' with footnote 2 pointing to 'https://github.com/google-research-datasets/uibert'. This explicitly indicates the release of datasets, not the source code for the methodology described in the paper.
Open Datasets Yes Our pretraining dataset consists of 537k pairs of UI screenshots and their view hierarchies obtained using the Firebase Robo app crawler [Firebase, 2020]. We use Rico data with human-labelled icon types for every VH leaf node in two levels of granularity: 32 and 77 classes [He et al., 2020].
Dataset Splits Yes We use 900k pairs for training, 32k pairs for dev, and 32k pairs for test. The train, dev, and test sets respectively contain 16.9k, 2.1k and 1.8k UI componenets with their referring expressions. We use the Rico SCA ([Li et al., 2020b]) that have 25k synchronized and 47k unsynchronized UIs and split them into train, dev, and test sets by a ratio of 8:1:1. We use all the 72k unique UIs in Rico across a total of 27 app types and split them in the ratio of 8:1:1 for train, dev, and test.
Hardware Specification Yes We use Adam [Kingma and Ba, 2014] with learning rate 1e5, β1 = 0.9, β2 = 0.999, ϵ =1e-7 and batch size 128 on 16 TPUs for 350k steps.
Software Dependencies No The paper mentions software components and models like Albert [Lan et al., 2019], Efficient Net [Tan and Le, 2019], MLKit [MLKit, 2020], and Adam [Kingma and Ba, 2014], but it does not specify explicit version numbers for these software dependencies or libraries.
Experiment Setup Yes We use Adam [Kingma and Ba, 2014] with learning rate 1e5, β1 = 0.9, β2 = 0.999, ϵ =1e-7 and batch size 128 on 16 TPUs for 350k steps. For each finetuning task, we train the model for 200k steps with dropout rate of 0.1, and use the same optimizer configuration and batch size as that in pretraining.