DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Authors: Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song634-642

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We release a dataset of about 6K paired documents and slide decks used in our experiments. We show that our approach outperforms strong baselines and produces slides with rich content and aligned imagery.
Researcher Affiliation Collaboration Tsu-Jui Fu1, William Yang Wang1, Daniel Mc Duff2, Yale Song2 1 UC Santa Barbara 2 Microsoft Research
Pseudocode No The paper includes architectural diagrams and mathematical equations but does not present any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Project webpage: https://doc2ppt.github.io/
Open Datasets Yes To help accelerate research in this domain, we release a dataset of about 6K paired documents and slide decks used in our experiments. Project webpage: https://doc2ppt.github.io/
Dataset Splits Yes Table 1: Descriptive statistics of our dataset. We report both the total count and the average number (in parenthesis). Train / Val / Test: CV 2,073 / 265 / 262, NLP 741 / 93 / 97, ML 1,872 / 234 / 236, Total 4,686 / 592 / 595
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general statements like 'We train our network end-to-end'.
Software Dependencies No The paper mentions software components like RoBERTa, ResNet-152, Bi-GRU, Seq2Seq, and ADAM, but it does not specify version numbers for these software dependencies.
Experiment Setup Yes For the DR, we use a Bi-GRU with 1,024 hidden units and set the MLPs to output 1,024-dimensional embeddings. Each layer of the PT is based on a 256-unit GRU. The PAR is designed as Seq2Seq (Bahdanau, Cho, and Bengio 2015) with 512-unit GRU. We train our network end-to-end using ADAM (Diederik P. Kingma 2014) withlearning rate 3e-4. We tune the two hyper-parameters θR and θA via cross-validation (we set θR = 0.8, θA = 0.9).