reproducibilityindex.ai

WIERT: Web Information Extraction via Render Tree

Authors: Zimeng Li, Bo Shao, Linjun Shou, Ming Gong, Gen Li, Daxin Jiang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate WIERT on the Klarna product page dataset, a manually labeled dataset of renderable e-commerce web pages, demonstrating its effectiveness and robustness.
Researcher Affiliation	Collaboration	1School of Computer Science and Engineering, Beihang University, Beijing, China 2Microsoft STCA
Pseudocode	No	The paper describes the model architecture and training process in text and a diagram (Figure 3), but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about open-sourcing the code or provide a link to a code repository.
Open Datasets	Yes	Klarna product page dataset The Klarna product page dataset contains 51,701 manually labeled product pages from 8,175 real e-commerce websites(Hotti et al. 2021).
Dataset Splits	Yes	As we can see, the Klarna dataset provided a official train/test split. In our experiments, we keep the official test set to measure generalization performance and split the official train set into a new train set and a validation set without overlapping according to the ratio of 9 : 1.
Hardware Specification	Yes	All experiments are conducted on eight V100 GPUs.
Software Dependencies	No	The paper mentions using a "pretrained Big Bird model" and "BERT or RoBERTa as backbones" but does not specify version numbers for these or any other software dependencies like programming languages or libraries.
Experiment Setup	Yes	For all experiments, we set the batch size to 16 and use an initial learning rate of 5 10 5, which decays to 85% after each epoch. Through coarse hyperparameter tuning, we set the weights of three losses as λ1 = 1, λ2 = 0.2, λ = 0.1.