UIBert: Learning Generic Multimodal Representations for UI Understanding
Authors: Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, Blaise Agüera y Arcas
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy. |
| Researcher Affiliation | Collaboration | Chongyang Bai1 , Xiaoxue Zang2 , Ying Xu2 , Srinivas Sunkara2 , Abhinav Rastogi2 , Jindong Chen2 and Blaise Ag uera y Arcas2 1Dartmouth College 2Google Research bchy1023@gmail.com, {xiaoxuez,yingyingxuxu,srinivasksun,abhirast,jdchen,blaisea}@google.com |
| Pseudocode | No | The paper describes the steps of its pre-training tasks and architecture in text but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | No | The paper states 'We release two new datasets extended from Rico [Deka et al., 2017] for two tasks: similar UI component retrieval and referring expression component retrieval.2' with footnote 2 pointing to 'https://github.com/google-research-datasets/uibert'. This explicitly indicates the release of datasets, not the source code for the methodology described in the paper. |
| Open Datasets | Yes | Our pretraining dataset consists of 537k pairs of UI screenshots and their view hierarchies obtained using the Firebase Robo app crawler [Firebase, 2020]. We use Rico data with human-labelled icon types for every VH leaf node in two levels of granularity: 32 and 77 classes [He et al., 2020]. |
| Dataset Splits | Yes | We use 900k pairs for training, 32k pairs for dev, and 32k pairs for test. The train, dev, and test sets respectively contain 16.9k, 2.1k and 1.8k UI componenets with their referring expressions. We use the Rico SCA ([Li et al., 2020b]) that have 25k synchronized and 47k unsynchronized UIs and split them into train, dev, and test sets by a ratio of 8:1:1. We use all the 72k unique UIs in Rico across a total of 27 app types and split them in the ratio of 8:1:1 for train, dev, and test. |
| Hardware Specification | Yes | We use Adam [Kingma and Ba, 2014] with learning rate 1e5, β1 = 0.9, β2 = 0.999, ϵ =1e-7 and batch size 128 on 16 TPUs for 350k steps. |
| Software Dependencies | No | The paper mentions software components and models like Albert [Lan et al., 2019], Efficient Net [Tan and Le, 2019], MLKit [MLKit, 2020], and Adam [Kingma and Ba, 2014], but it does not specify explicit version numbers for these software dependencies or libraries. |
| Experiment Setup | Yes | We use Adam [Kingma and Ba, 2014] with learning rate 1e5, β1 = 0.9, β2 = 0.999, ϵ =1e-7 and batch size 128 on 16 TPUs for 350k steps. For each finetuning task, we train the model for 200k steps with dropout rate of 0.1, and use the same optimizer configuration and batch size as that in pretraining. |