Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Unbiased Estimation for Partially Observed Diffusions

Authors: Jeremy Heng, Jeremie Houssineau, Ajay Jasra

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate various aspects of our method on an Ornstein Uhlenbeck model, a logistic diffusion model for population dynamics, and a neural network model for grid cells.
Researcher Affiliation Academia Jeremy Heng EMAIL ESSEC Business School; Jeremie Houssineau EMAIL Division of Mathematical Sciences Nanyang Technological University; Ajay Jasra EMAIL School of Data Science Chinese University of Hong Kong, Shenzhen
Pseudocode Yes Algorithm 1 Conditional particle filter (CPF) at parameter θ Θ and discretization level l N0... Algorithm 2 Two coupled CPF (2-CCPF) at parameter θ Θ and discretization level l N0... Algorithm 3 Maximal coupling of two resampling distributions R(w1:N) and R( w1:N) (2-Maximal)... Algorithm 4 Four coupled CPF (4-CCPF) at parameter θ Θ and discretization levels l 1 and l N... Algorithm 5 Maximal coupling of the maximal couplings R(wl 1,1:N, wl,1:N) and R( wl 1,1:N, wl,1:N) (4-Maximal)... Algorithm 6 Multilevel CPF (ML-CPF) at parameter θ Θ and discretization levels l 1 and l N
Open Source Code Yes An R package to reproduce all numerical results can be found at https://github.com/jeremyhengjm/Unbiased Score.
Open Datasets Yes Next we consider an application from population ecology to model the dynamics of a population of red kangaroos (Macropus rufus) in New South Wales, Australia. Figure 5a displays data yt1, . . . , yt P N2 0 from Caughley et al. (1987)... neural network model for single neurons to analyze grid cells spike data (https://www.ntnu.edu/kavli/research/grid-cell-data) recorded in the medial entorhinal cortex of rats that were running on a linear track (Hafting et al., 2008).
Dataset Splits No The paper uses either simulated data or real-world observed data (e.g., population dynamics, grid cell spike data) for parameter inference. It discusses the number of observations (e.g., T=25, P=41) but does not specify any explicit training, validation, or test dataset splits, as the goal is parameter inference on the entire dataset rather than model evaluation through splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used to run the experiments. It focuses on algorithmic performance and outcomes without mentioning the underlying computational resources.
Software Dependencies No The paper mentions an "R package to reproduce all numerical results" (https://github.com/jeremyhengjm/Unbiased Score), indicating that R is used. However, it does not specify the version of R or any other software libraries, packages, or their respective version numbers.
Experiment Setup Yes Section 5.1: "N = 128 particles", "b = 90%-quantile( τ l θ) at level l = 3 and I = b ( naive ); b = 90%-quantile( τ l θ) at level l = 3 and I = b ( simple ); and b = 90%-quantile( τ l θ) and I = 10b ( time-averaged )". Section 5.2: "N = 256 particles and adaptive resampling", "burn-in of b = 90%-quantile( τ l θ) at level l = 3", "prior distribution for the transformed parameters (θ1, log θ2, log θ3, log θ4) as Ndθ(µ0, Σ0), with µ0 = (0, 1, 1, 1) and Σ0 = diag(52, 22, 22, 22)", "learning rate in (5) be component-dependent by taking εm = diag((100 + m) 0.6(10 2, 10 2, 10 4, 10 2))". Section 5.3: "N = 256 particles and adaptive resampling", "burn-in of b = 100", "constant learning rate of εm = 10 3".