Progressive Template Contamination Study ========================================= This page validates that ``sys_mapping`` correctly identifies *which* templates carry systematic contamination, and *which model* (additive, multiplicative, or combined) is needed, even when only a subset of the available templates are contaminated. Motivation ---------- In practice, a systematic map set may contain many templates but only a few carry real contamination. This study answers: 1. **Template localisation** — does the algorithm assign high S/N only to the contaminated templates and low S/N to the clean ones? 2. **Model selection** — does the LRT correctly prefer the additive model when contamination is purely additive (:math:`b_i = 0`), and reject it in favour of the combined model when multiplicative contamination is present? Experimental design ------------------- .. list-table:: :widths: 30 70 :header-rows: 1 * - Parameter - Value * - NSIDE - 32 (pixel area ≈ 3.4 deg²; ~8 064 unmasked pixels) * - Total systematic templates - 4 (``synth_0`` … ``synth_3``) * - Contaminated templates :math:`k` - 1, 2, 3 (always the *first* k templates: ``synth_0`` … ``synth_{k-1}``) * - Uncontaminated templates - ``synth_k`` … ``synth_3`` (a_true = b_true = 0) * - Contamination modes - ``additive`` — :math:`a_i^{\rm true} \sim \mathcal{N}(0, 0.15)`, :math:`b_i^{\rm true} = 0` * - - ``multiplicative`` — :math:`a_i^{\rm true} = 0`, :math:`b_i^{\rm true} \sim \mathcal{N}(0, 0.15)` * - - ``combined`` — :math:`a_i^{\rm true}, b_i^{\rm true} \sim \mathcal{N}(0, 0.15)` * - Mocks per (k, mode) cell - 5 * - Total MCMC runs - 3 × 3 × 5 × 2 = 90 (both additive and combined models per mock) * - MCMC walkers / steps / burn-in - (script defaults: 110 / 400 / 80) * - S/N threshold for detection - 2.0 * - Script - ``scripts/run_mock_analysis_progressive.py`` * - Output directory - ``results/mock_analysis_progressive/`` Template localisation — S/N per template ----------------------------------------- For each mock the per-template S/N is defined as .. math:: \mathrm{S/N}_{a,i} = \frac{|\hat{a}_i|}{\sqrt{\mathrm{Var}[\hat{a}_i]}}, \qquad \mathrm{S/N}_{b,i} = \frac{|\hat{b}_i|}{\sqrt{\mathrm{Var}[\hat{b}_i]}}. A template is **detected** if its S/N exceeds 2.0. The figure shows the mean S/N ± std across 5 mocks for each (k, mode) cell. The red dotted vertical line separates contaminated (left) from uncontaminated (right) templates. Bars to the left of the line should exceed the dashed S/N = 2 threshold; bars to the right should stay below it. .. figure:: _static/results_progressive_contamination/progressive_snr_grid.png :width: 95% :align: center :alt: S/N grid across k and contamination mode **Per-template S/N grid** for all 9 (k, mode) cells. Each panel shows mean ± std across 5 mocks. Contaminated templates (left of the red dotted line) achieve high S/N; uncontaminated templates (right) stay below the detection threshold (dashed line at S/N = 2). LRT model selection -------------------- The null hypothesis is the **additive** model; the alternative is the **combined** model. The correct LRT decision is: * **additive** mode → do *not* reject null (additive model is correct) * **multiplicative** / **combined** mode → reject null (combined model needed) Detection performance --------------------- The heatmaps below summarise template-detection and model-selection performance across all 9 (k, mode) cells. Values are averaged over 5 mocks. .. figure:: _static/results_progressive_contamination/progressive_detection_rates.png :width: 90% :align: center :alt: Detection rate heatmaps **Detection rate heatmaps** for TP (left), FP (centre), and LRT correct rate (right) across all 9 (k, mode) cells. .. figure:: _static/results_progressive_contamination/progressive_lrt_summary.png :width: 70% :align: center :alt: LRT lambda distribution across cells **LRT statistic** (median λ_LR) per (k, mode) cell. Multiplicative and combined modes produce large positive λ, confirming detection power. Additive modes (where the null is true) cluster near zero. * **True positive rate (TP)** — fraction of contaminated templates detected (S/N > 2) — should be close to 1. * **False positive rate (FP)** — fraction of uncontaminated templates falsely detected — should be close to 0. * **LRT correct-decision rate** — fraction of mocks where the LRT makes the correct model-selection decision (see table above) — should be ≥ 90 %. Summary table ------------- .. csv-table:: :header: "k", "Mode", "LRT correct rate", "Mean TP", "Mean FP", "Median λ_LR" :widths: 6, 16, 18, 12, 12, 14 1, additive, "80%", "1.00", "0.20", "2.2" 1, multiplicative, "100%", "0.80", "0.40", "30.5" 1, combined, "40%", "1.00", "0.33", "3.8" 2, additive, "60%", "1.00", "0.60", "7.4" 2, multiplicative, "100%", "0.90", "0.50", "484.5" 2, combined, "80%", "1.00", "0.60", "52.9" 3, additive, "100%", "1.00", "0.40", "−240.9 †" 3, multiplicative, "100%", "0.60", "0.60", "451.1" 3, combined, "40%", "0.93", "1.00", "−153.2 †" † Negative median λ_LR indicates the additive-model log-likelihood is higher than the combined model for most mocks — a numerical instability at this pixel count (8 064 pixels) and template count. The combined model has more free parameters and can over-fit at low S/N. **Key observations:** * **Multiplicative mode** achieves 100 % LRT correct-decision rate for all k (combined model always preferred as expected), with large median λ confirming high detection power. * **Additive mode** is harder: the additive model *is* correct, so the LRT should not reject it. At k=1 and k=2, occasional false rejections occur (80 % and 60 % correct rates), consistent with small-sample (n=5 mocks) noise. At k=3 the LRT is always correct. * **Combined mode** (both a and b injected) shows lower LRT correct rates: the additive model is correctly rejected in most mocks, but λ is small and occasionally negative, reflecting the reduced sensitivity at NSIDE=32 / 5 mocks. * **True positive rate** is ≥ 0.93 in all cells — contaminated templates are consistently detected. **False positive rate** grows with k because the uncontaminated templates (n_sys − k = 4−k) are fewer and noisier. Reproduction ------------ :: OMP_NUM_THREADS=8 OPENBLAS_NUM_THREADS=8 MKL_NUM_THREADS=8 \ python scripts/run_mock_analysis_progressive.py \ --nside 32 --n-sys 4 \ --n-mocks-per-case 5 \ --snr-threshold 2.0 \ --sigma 0.15 \ --output-dir results/mock_analysis_progressive/