WIERT: Web Information Extraction via Render Tree
Authors: Zimeng Li, Bo Shao, Linjun Shou, Ming Gong, Gen Li, Daxin Jiang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate WIERT on the Klarna product page dataset, a manually labeled dataset of renderable e-commerce web pages, demonstrating its effectiveness and robustness. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Engineering, Beihang University, Beijing, China 2Microsoft STCA |
| Pseudocode | No | The paper describes the model architecture and training process in text and a diagram (Figure 3), but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about open-sourcing the code or provide a link to a code repository. |
| Open Datasets | Yes | Klarna product page dataset The Klarna product page dataset contains 51,701 manually labeled product pages from 8,175 real e-commerce websites(Hotti et al. 2021). |
| Dataset Splits | Yes | As we can see, the Klarna dataset provided a official train/test split. In our experiments, we keep the official test set to measure generalization performance and split the official train set into a new train set and a validation set without overlapping according to the ratio of 9 : 1. |
| Hardware Specification | Yes | All experiments are conducted on eight V100 GPUs. |
| Software Dependencies | No | The paper mentions using a "pretrained Big Bird model" and "BERT or RoBERTa as backbones" but does not specify version numbers for these or any other software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | For all experiments, we set the batch size to 16 and use an initial learning rate of 5 10 5, which decays to 85% after each epoch. Through coarse hyperparameter tuning, we set the weights of three losses as λ1 = 1, λ2 = 0.2, λ = 0.1. |