Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mitigating Real-World Distribution Shifts in the Fourier Domain

Authors: Kiran Krishnamachari, See-Kiong Ng, Chuan-Sheng Foo

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate through extensive empirical evaluations across time-series, image classification and semantic segmentation tasks that FMM is effective both individually and when combined with a variety of existing methods to overcome real-world distribution shifts.
Researcher Affiliation	Academia	Kiran Krishnamachari EMAIL Institute for Infocomm Research (I 2R), ASTAR, Singapore School of Computing, National University of Singapore, Singapore See-Kiong Ng EMAIL Institute of Data Science, National University of Singapore, Singapore School of Computing, National University of Singapore, Singapore Chuan-Sheng Foo EMAIL Institute for Infocomm Research (I 2R), ASTAR, Singapore Centre for Frontier AI Research (CFAR), A*STAR, Singapore
Pseudocode	Yes	Algorithm 1 Fourier Moment Matching 1: Input: Ds = {xi s, yi s} are labeled source domain training samples with i {1, 2, ..., N}. Dt = {xj t} are unlabeled target domain samples with j {1, 2, ..., M}. A model f with initial parameters θ. 2: Step 1: Compute Fourier-statistics for source and target data (µs, Cs, µt, Ct) 3: µs = 1 N P N i=1 \|\|F(xi s)\|\| and µt = 1 M P M j=1 \|\|F(xj t)\|\| 4: Cs = 1 N 1 P N i=1 (\|\|F(xi s)\|\| µs)(\|\|F(xi s)\|\| µs)T and Ct = 1 M 1 P M j=1 (\|\|F(xj t)\|\| µt)(\|\|F(xj t)\|\| µt)T 5: 6: Step 2: Transform source data to match statistics of target data in Fourier domain 7: for i = 1 to N do 8: Ai s = \|\|F(xi s)\|\| {compute DFT-amplitudes} 9: P i s = phase(F(xi s)) {compute DFT-phase} 10: if FMM: 1st Order then 11: FMM(Ai s) = Ai s µs + µt 12: else if FMM: 2nd Order then 13: FMM(Ai s) = (Ai s µs) C 1/2 s C1/2 t + µt 14: end if 15: xi s = F 1 (FMM(Ai s), P i s) {inverse-DFT} 16: Store FMM transformed source data xi s 17: end for 18: Step 3: Standard training on FMM transformed source data 19: T = training iterations 20: for j = 1 to T do 21: Sample K labeled and FMM-transformed source domain images 22: for i = 1 to K do 23: Compute classifier loss L(xi s, yi s) 24: end for 25: Update classifier f( ; θ) to minimize loss 26: end for 27: 28: Step 4: Standard evaluation on target domain data 29: for j = 1 to M do 30: predict f(xj t; θ) 31: end for
Open Source Code	Yes	Code is available at https://github.com/kiranchari/Fourier Moment Matching.
Open Datasets	Yes	We adopted the Sleep-EDF dataset (Goldberger et al., 2000), which contains EEG readings from 20 healthy subjects. We used the TAU Urban Audio (Heittola et al., 2020b) dataset as provided in the development set of (Heittola et al., 2020a). We evaluated methods on unsupervised domain adaptation from clean (source) to corrupted (target) images in (CIFAR10 CIFAR10-C and Image Net Image Net-C (Hendrycks & Dietterich, 2019)). For benchmarking on i Wild Cam-WILDS (Beery et al., 2020; Sagawa et al., 2022) For benchmarking on Camelyon17-WILDS (Bandi et al., 2018) For semantic segmentation, we used the Transfer Learning Library library to train models using ERM, FDA (Yang & Soatto, 2020) and Adv ENT (Vu et al., 2019). We benchmarked methods for domain adaptive semantic segmentation from Cityscapes to Foggy Cityscapes (Cordts et al., 2016) and Synthia (Ros et al., 2016) to Cityscapes.
Dataset Splits	Yes	We selected a single channel (i.e., Fpz-Cz), and 10 different subjects to construct five cross-domain (cross-subject) scenarios as proposed in (Ragab et al., 2023). We used 10 hours of labeled training data from device A as the source domain, while the smaller datasets of the other devices were used as target domains. Following the protocol in the WILDS framework, we used the model with the best validation domain performance, averaged across ten runs with different random seeds.
Hardware Specification	No	The paper mentions computational complexity and feasibility for high-resolution input on "standard machines" but does not provide specific details on the hardware (GPU/CPU models, memory) used for running their experiments. For example, Section 3.2.1 states: "The space and time complexity of these operations can grow as O(D3), which can be infeasible for high-resolution input on standard machines, e.g. Image Net-size images." This is a general observation about computational requirements, not a specification of the experimental hardware.
Software Dependencies	No	The paper mentions several software components like "Ada Time library", "Transfer Learning Library", "Git Hub repository" for HRDA, and "Mel Spectrogram" usage parameters but does not specify version numbers for these or other key libraries/frameworks. For example, Appendix D.1 shows `mel_spectrogram = Mel Spectrogram(...)` but doesn't state the version of the library used.
Experiment Setup	Yes	All methods were trained for 40 epochs with a batch-size of 128. The Adam optimizer with fixed weight-decay (1e-4) and (β1, β2) = (0.5, 0.99) was used to train all models. For each method, learning rate and other hyper-parameters were chosen using an extensive random search including 100 hyperparameter combinations per method and a target validation set (see Appendix C.1 for details). On CIFAR10, we trained all models for 150 epochs using an initial learning rate (lr) that produced the best target domain validation set performance, selected from {0.1,0.01,0.001}, and lr decayed by a factor of 0.1 every 50 epochs. On Image Net50, we trained all models for 90 epochs with an initial learning rate that produced the best target domain validation set performance, selected from {0.1,0.01,0.001}, and lr decayed by a factor of 0.1 every 30 epochs). For each method, hyper-parameters were selected on one task and applied to other tasks, requiring the hyper-parameters to generalize across tasks (see Appendix Table 12). Appendix C.1, Table 9 provides hyperparameters for Sleep Stage Classification. Appendix E.1, Table 12 and 13 provide hyperparameters for Image Classification. Appendix 4.1.3, Table 14 provides hyperparameters for semantic segmentation.