Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions

Authors: Cheng Luo, Jianghui Wang, Bing Li, Siyang Song, Bernard Ghanem

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations on Response Net demonstrate that Omni Response outperforms baseline models in terms of semantic speech content, audio-visual synchronization, and generation quality. Our dataset, code, and models are publicly available at https://omniresponse.github.io/.
Researcher Affiliation Academia 1King Abdullah University of Science and Technology, 2University of Exeter
Pseudocode No The paper describes the model architecture and methodology in detail, but it does not include any explicitly labeled pseudocode or algorithm blocks. Figure 3 illustrates the architecture of Tempo Voice but is a diagram, not pseudocode.
Open Source Code No Our dataset, code, and models are publicly available at https://omniresponse.github.io/. NeurIPS Paper Checklist Question 5: Does the paper provide open access to the data and code...? Answer: [No] Justification: All code and data will be made available upon acceptance of the paper.
Open Datasets No To fill the dataset gap, we introduce Response Net that comprises 696 temporally synchronized dyadic video pairs, totaling over 14 hours of natural conversational exchanges. Our dataset, code, and models are publicly available at https://omniresponse.github.io/. NeurIPS Paper Checklist Question 5: Does the paper provide open access to the data and code...? Answer: [No] Justification: All code and data will be made available upon acceptance of the paper.
Dataset Splits No The paper mentions evaluating on the "Response Net test set" in Table 2, but it does not provide specific details on how the dataset is split into training, validation, and test sets (e.g., percentages or sample counts) in the main text.
Hardware Specification Yes Our framework was implemented using Py Torch [52] and trained on four NVIDIA Tesla A100 GPUs.
Software Dependencies No Our framework was implemented using Py Torch [52] and trained on four NVIDIA Tesla A100 GPUs. The model optimization was performed using the Adam W optimizer [33] with a learning rate of 2 10 5, β1 = 0.9, β2 = 0.999, and a weight decay of 10 4, accompanied by a cosine learning rate scheduler. Training was executed with a batch size of one for 2,000 epochs. Additionally, we fine-tuned the LLM using the Lo RA [26] technique with a Lo RA rank of 64 and a Lo RA alpha value of 16. While PyTorch, AdamW, LoRA, Spark-TTS, and Moss Former2 are mentioned, specific version numbers for these software dependencies are not provided.
Experiment Setup Yes The model optimization was performed using the Adam W optimizer [33] with a learning rate of 2 10 5, β1 = 0.9, β2 = 0.999, and a weight decay of 10 4, accompanied by a cosine learning rate scheduler. Training was executed with a batch size of one for 2,000 epochs. Additionally, we fine-tuned the LLM using the Lo RA [26] technique with a Lo RA rank of 64 and a Lo RA alpha value of 16.