Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
Authors: Ge Zheng, Bin Yang, Jiajin Tang, Hong-Yu Zhou, Sibei Yang
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The rationales generated by DDCo T not only improve the reasoning abilities of both large and small language models in zero-shot prompting and fine-tuning learning, significantly outperforming state-of-the-art methods but also exhibit impressive generalizability and explainability. |
| Researcher Affiliation | Academia | 1Shanghai Tech University 2The University of Hong Kong |
| Pseudocode | No | The paper describes the steps of the DDCo T prompting but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper mentions a "Project Page: https://toneyaya.github.io/ddcot/" but does not explicitly state that the source code for the methodology is available there, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Science QA benchmark [31] is the first multimodal science question-answer dataset comprising 21,000 questions with multiple choices and images. |
| Dataset Splits | Yes | Following previous works [71, 31], we divide Science QA into training, validation, and test sets, which contain 12,726, 4,241, and 4,241 examples, respectively. |
| Hardware Specification | Yes | All experiments are implemented by Py Torch [39] and Hugging Face [61] and conducted on NVIDIA Tesla A40 GPUs. |
| Software Dependencies | No | All experiments are implemented by Py Torch [39] and Hugging Face [61] and conducted on NVIDIA Tesla A40 GPUs. Specific version numbers for PyTorch or Hugging Face are not provided. |
| Experiment Setup | Yes | We train our model for 30 epochs with a learning rate of 1e-4 and batch size of 16. |