reproducibilityindex.ai

UIBert: Learning Generic Multimodal Representations for UI Understanding

Authors: Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, Blaise Agüera y Arcas

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.
Researcher Affiliation	Collaboration	Chongyang Bai1 , Xiaoxue Zang2 , Ying Xu2 , Srinivas Sunkara2 , Abhinav Rastogi2 , Jindong Chen2 and Blaise Ag uera y Arcas2 1Dartmouth College 2Google Research bchy1023@gmail.com, {xiaoxuez,yingyingxuxu,srinivasksun,abhirast,jdchen,blaisea}@google.com
Pseudocode	No	The paper describes the steps of its pre-training tasks and architecture in text but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	No	The paper states 'We release two new datasets extended from Rico [Deka et al., 2017] for two tasks: similar UI component retrieval and referring expression component retrieval.2' with footnote 2 pointing to 'https://github.com/google-research-datasets/uibert'. This explicitly indicates the release of datasets, not the source code for the methodology described in the paper.
Open Datasets	Yes	Our pretraining dataset consists of 537k pairs of UI screenshots and their view hierarchies obtained using the Firebase Robo app crawler [Firebase, 2020]. We use Rico data with human-labelled icon types for every VH leaf node in two levels of granularity: 32 and 77 classes [He et al., 2020].
Dataset Splits	Yes	We use 900k pairs for training, 32k pairs for dev, and 32k pairs for test. The train, dev, and test sets respectively contain 16.9k, 2.1k and 1.8k UI componenets with their referring expressions. We use the Rico SCA ([Li et al., 2020b]) that have 25k synchronized and 47k unsynchronized UIs and split them into train, dev, and test sets by a ratio of 8:1:1. We use all the 72k unique UIs in Rico across a total of 27 app types and split them in the ratio of 8:1:1 for train, dev, and test.
Hardware Specification	Yes	We use Adam [Kingma and Ba, 2014] with learning rate 1e5, β1 = 0.9, β2 = 0.999, ϵ =1e-7 and batch size 128 on 16 TPUs for 350k steps.
Software Dependencies	No	The paper mentions software components and models like Albert [Lan et al., 2019], Efﬁcient Net [Tan and Le, 2019], MLKit [MLKit, 2020], and Adam [Kingma and Ba, 2014], but it does not specify explicit version numbers for these software dependencies or libraries.
Experiment Setup	Yes	We use Adam [Kingma and Ba, 2014] with learning rate 1e5, β1 = 0.9, β2 = 0.999, ϵ =1e-7 and batch size 128 on 16 TPUs for 350k steps. For each ﬁnetuning task, we train the model for 200k steps with dropout rate of 0.1, and use the same optimizer conﬁguration and batch size as that in pretraining.