Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Realistic Earth-Observation Constellation Scheduling: Benchmark and Methodology

Authors: Luting Wang, Yinghao Xiang, Hongliang Huang, Dongjun Li, Chen Gao, Si Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that AEOSFormer outperforms baseline models in task completion and energy efficiency, with ablation studies highlighting the contribution of each component.
Researcher Affiliation	Academia	Luting Wang Yinghao Xiang Hongliang Huang Dongjun Li Chen Gao Si Liu Beihang University
Pseudocode	No	The paper describes the methodology using text and diagrams (Figures 2, 3, 5, 6) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and data are provided in https://github.com/buaa-colalab/AEOSBench.
Open Datasets	Yes	All benchmark data and annotations are publicly accessible. The test split incorporates real satellite data from publicly available sources1, enabling evaluation on authentic data. 1N2YO (www.n2yo.com) and Gunter s Space Page (space.skyrocket.de).
Dataset Splits	Yes	We partition AEOS-Bench into four splits. The train split consists of 16, 218 trajectories with 2, 907 satellite assets. The val-seen split includes 64 scenarios using the same satellites as the train split. The val-unseen split features 64 scenarios with 500 satellites not present in the train split. The test split contains 64 scenarios with 500 satellites, each having realistic properties sourced from the web.
Hardware Specification	Yes	Both training and evaluation are performed on a Linux server with 256 CPU cores, 984 GB RAM, and 8 RTX 4090 GPUs.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer' and 'Basilisk engine' but does not provide specific version numbers for these or any other key software dependencies required for replication.
Experiment Setup	Yes	The internal constraint module C is implemented as a multi-layer perception (MLP) with two hidden layers of width 1024. The transformer encoder E and decoder D are configured with a width of 512, a depth of 12, and 16 attention heads. All loss weights are assigned as ws = wt = wa = 1. Training is conducted with the Adam W optimizer [24] with a base learning rate of 10 4, β1 = 0.9, β2 = 0.98, and weight decay 10 4. Each training batch contains 48 timesteps uniformly sampled from a trajectory. The supervised stage spans 30, 000 iterations, with a linear warm-up of the learning rate from 10 8 to 10 4 over the first 10, 000 iterations. The complete iterative pipeline comprises three supervised stages, culminating in a total of 90, 000 iterations.