Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation

Authors: Jiuhong Xiao, Roshan Nayak, Ning Zhang, Daniel Tortei, Giuseppe Loianno

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations across multiple RGB-T benchmarks demonstrate that Thermal Gen achieves comparable or superior translation performance compared to existing GAN-based and diffusion-based methods. To our knowledge, Thermal Gen is the first RGB-T image translation model capable of synthesizing thermal images that reflect significant variations in viewpoints, sensor characteristics, and environmental conditions.
Researcher Affiliation	Collaboration	Jiuhong Xiao New York University EMAIL Roshan Nayak New York University EMAIL Ning Zhang Technology Innovation Institute EMAIL Daniel Tortei Technology Innovation Institute EMAIL Giuseppe Loianno University of California, Berkeley EMAIL
Pseudocode	No	The paper describes the methodology in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Project page: xjh19971.github.io/Thermal Gen We will publicly release the code, model, and datasets after review.
Open Datasets	Yes	To support large-scale training, we curated eight public satellite-aerial, aerial, and ground RGB-T paired datasets, and introduced three new large-scale satellite-aerial RGBT datasets DJI-day, Bosonplus-day, and Bosonplus-night captured across diverse times, sensor types, and geographic regions. ... We will publicly release the code, model, and datasets after review.
Dataset Splits	Yes	Table 1: Overview of RGB-T paired datasets. Sample numbers that correspond to each dataset s train/validation/test splits are provided. A total of 200k samples are used for large-scale training. For the satellite-aerial datasets, the number of samples is computed using a sample stride of 35m, indicating the gap between adjacent samples.
Hardware Specification	Yes	C Additional Training Details We conduct all training and evaluation using a single NVIDIA A100 or H100 GPU.
Software Dependencies	No	The paper mentions using specific optimizers like Adam W and frameworks like Latent Diffusion Model, but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	C Additional Training Details We conduct all training and evaluation using a single NVIDIA A100 or H100 GPU. For training the thermal encoder and decoder, we employ a batch size of 16 and use the Adam W [33] optimizer with a learning rate of 6 10 5 and a weight decay of 1 10 3, over a total of 200k training steps. All other configurations follow the default settings of the Latent Diffusion Model [39]. In training the flow-based generative models, we use a batch size of 64 with Adam W optimizer at a learning rate of 1 10 4 and no weight decay, for 200k training steps. ... During training, each image is randomly resized and cropped to 256 256 pixels. For evaluation, images are resized to 256 256 pixels with denoising steps T = 50. The style embedding dimensionality is 1024.