DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents
Authors: Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song634-642
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We release a dataset of about 6K paired documents and slide decks used in our experiments. We show that our approach outperforms strong baselines and produces slides with rich content and aligned imagery. |
| Researcher Affiliation | Collaboration | Tsu-Jui Fu1, William Yang Wang1, Daniel Mc Duff2, Yale Song2 1 UC Santa Barbara 2 Microsoft Research |
| Pseudocode | No | The paper includes architectural diagrams and mathematical equations but does not present any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Project webpage: https://doc2ppt.github.io/ |
| Open Datasets | Yes | To help accelerate research in this domain, we release a dataset of about 6K paired documents and slide decks used in our experiments. Project webpage: https://doc2ppt.github.io/ |
| Dataset Splits | Yes | Table 1: Descriptive statistics of our dataset. We report both the total count and the average number (in parenthesis). Train / Val / Test: CV 2,073 / 265 / 262, NLP 741 / 93 / 97, ML 1,872 / 234 / 236, Total 4,686 / 592 / 595 |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general statements like 'We train our network end-to-end'. |
| Software Dependencies | No | The paper mentions software components like RoBERTa, ResNet-152, Bi-GRU, Seq2Seq, and ADAM, but it does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | For the DR, we use a Bi-GRU with 1,024 hidden units and set the MLPs to output 1,024-dimensional embeddings. Each layer of the PT is based on a 256-unit GRU. The PAR is designed as Seq2Seq (Bahdanau, Cho, and Bengio 2015) with 512-unit GRU. We train our network end-to-end using ADAM (Diederik P. Kingma 2014) withlearning rate 3e-4. We tune the two hyper-parameters θR and θA via cross-validation (we set θR = 0.8, θA = 0.9). |