End-to-end validation
This page documents the end-to-end validation of sys_mapping using
synthetic galaxy catalogs generated by the sys_mapping.mocks module.
All six decontamination methods are exercised across four contamination
scenarios, measuring how well each recovers the true underlying galaxy
overdensity field \(\delta_g\).
The script that produces all figures and metrics below is
scripts/run_validation.py and can be re-run at any resolution with:
python scripts/run_validation.py \
--nside 32 --n-sys 3 --n-mean 50 \
--n-walkers 100 --n-steps 600 --n-burn 100 \
--output-dir docs/_static/results_validation
Simulation Setup
Parameter |
Value |
|---|---|
HEALPix NSIDE |
32 (pixel area ≈ 3.4 deg²) |
Number of pixels |
12 288 |
Unmasked pixels |
≈ 8 070 (b_gal > 20°) |
Number of templates |
3 |
Mean galaxy count |
50 gal/pixel |
Total galaxies |
≈ 400 000 – 420 000 |
Lognormal width σ |
0.5 |
MCMC walkers / steps |
100 / 600 (burn 100) |
Random seed |
42 |
Galaxy density field. The true overdensity \(\delta_g\) is a lognormal random field:
where \(G\) is a Gaussian random field with angular power spectrum \(C_\ell \propto (\ell+1)^{-2}\) normalised to unit variance before rescaling by \(\sigma\). The lognormal transformation ensures \(\delta_g > -1\) everywhere and produces the positive skewness characteristic of galaxy overdensities.
Templates. Three synthetic systematic templates are drawn from GRFs with distinct spectral slopes and normalised to zero mean and unit variance within the unmasked footprint. Their true amplitudes are drawn once from \(\mathcal{N}(0,\,0.10^2)\) and reused across all scenarios.
True amplitudes (seed 42)
t₁ |
t₂ |
t₃ |
|
|---|---|---|---|
\(a_i\) (add.) |
+0.0305 |
−0.1040 |
+0.0750 |
\(b_i\) (mult.) |
+0.0941 |
−0.1951 |
−0.1302 |
Contamination Scenarios
none — no systematic contamination. The observed field equals the true field: \(\hat{\delta}_g = \delta_g\). This scenario establishes the noise floor: any decontamination method applied here can only add noise, not remove signal.
additive — pure additive contamination:
multiplicative — pure multiplicative contamination:
Linear methods treat this as if it were additive; only MCMC with the combined model can fit both terms simultaneously.
combined — both additive and multiplicative contamination active at the same time, as expected in realistic survey data.
Methods Under Test
Label |
Description |
|---|---|
|
Observed field; no correction applied |
|
Ordinary least-squares regression of delta_obs on templates |
|
ElasticNet (L1 + L2) regularised regression with 3-fold CV |
|
Iterative Systematics Decontamination, polynomial order 1 |
|
ISD with polynomial order 3 (cross-terms included) |
|
MCMC additive-only model (JAX likelihood, emcee sampler) |
|
MCMC combined additive + multiplicative model |
All six methods operate on the masked overdensity and template arrays. MCMC results use the posterior median as point estimate after discarding the burn-in.
Galaxy Maps
Mollweide projections of the observed overdensity field for each scenario. Brighter regions indicate galaxy over-densities; the galactic plane mask (b_gal < 20°) is shown in grey.
No contamination |
Additive contamination |
|---|---|
|
|
Multiplicative contamination |
Combined contamination |
|
|
The additive templates add a spatially correlated foreground directly to the field, visually raising or lowering large-scale structure in a way that tracks the template morphology. Multiplicative contamination modulates the amplitude of existing over-densities, which is harder to disentangle from true clustering by eye.
Overdensity Distributions
Histograms of the pixel overdensity \(\delta_g\) for each scenario, comparing the true field (blue), observed contaminated field (orange), and recovered fields from the five methods.
No contamination |
Additive contamination |
|---|---|
|
|
Multiplicative contamination |
Combined contamination |
|
|
A well-performing method should shift the recovered histogram on top of the blue true-field curve. Systematic biases in the mean or width indicate residual contamination.
Scatter Plots: Recovered vs. True Field
Pixel-level scatter of the recovered overdensity against the true field. A perfect recovery lies on the identity line (dashed). The Pearson correlation coefficient \(r\) and RMS residual \(\langle(\hat{\delta}-\delta_\text{true})^2\rangle^{1/2}\) are shown in the legend.
No contamination |
Additive contamination |
|---|---|
|
|
Multiplicative contamination |
Combined contamination |
|
|
Weight Maps
Spatial distribution of the pixel weights \(w(p)\) produced by each method. For OLS/ISD/MCMC-additive the weight is \(w(p) = 1 / (1 + \hat{\boldsymbol{a}} \cdot \mathbf{t}(p))\). For MCMC-combined the weight is \(w(p) = 1/(1 + \hat{\boldsymbol{b}} \cdot \mathbf{t}(p))\) (multiplicative convention), matching the production pipeline. Uniform weight (all ones) means no template projection was applied.
No contamination |
Additive contamination |
|---|---|
|
|
Multiplicative contamination |
Combined contamination |
|
|
Recovery Metrics
The two primary metrics are:
RMS \(= \langle(\delta_\text{rec} - \delta_\text{true})^2\rangle^{1/2}\) — lower is better.
Pearson \(r(\delta_\text{rec},\,\delta_\text{true})\) — higher is better, maximum 1.
The obs row gives the baseline (no correction). All values are computed
over the unmasked pixels.
No contamination (none)
The noise floor scenario: all methods should perform similarly to obs
because there is no contamination to remove. Any method that fits
non-zero amplitudes will add noise via over-subtraction.
Method |
RMS |
r |
|---|---|---|
obs |
0.1474 |
0.9577 |
ols |
0.1499 |
0.9561 |
elasticnet (best) |
0.1474 |
0.9577 |
isd1 |
0.1499 |
0.9561 |
isd3 |
0.1485 |
0.9571 |
mcmc_add |
0.1499 |
0.9561 |
mcmc_comb |
0.1504 |
0.9558 |
ElasticNet’s L1 penalty shrinks spurious amplitudes to zero, so it matches the uncontaminated baseline exactly. All other methods perform within 3% of the noise floor. MCMC-combined fits both \(a_i\) and \(b_i\) even when both are zero, adding a small amount of extra variance relative to the noise floor.
Additive contamination (additive)
Contamination increases RMS from 0.147 (true noise floor) to 0.194. The linear methods (OLS, ElasticNet, ISD-1, MCMC-additive) are the correct model for this scenario; MCMC-combined additionally fits the multiplicative term and achieves the best overall field recovery.
Method |
RMS |
r |
|---|---|---|
obs |
0.1944 |
0.9440 |
ols |
0.1693 |
0.9575 |
elasticnet |
0.1689 |
0.9577 |
isd1 |
0.1689 |
0.9577 |
isd3 |
0.1669 |
0.9587 |
mcmc_add |
0.1692 |
0.9575 |
mcmc_comb (best) |
0.1545 |
0.9645 |
MCMC-combined achieves the lowest RMS (0.1545) and highest correlation (r = 0.9645), outperforming the linear methods by ~8% in RMS. Although the ground truth has \(b_i = 0\), the combined model absorbs residual non-linear structure through its multiplicative branch, resulting in a tighter recovery. ISD-3 is the best linear method (RMS 0.1669), benefiting from its expanded polynomial basis.
Multiplicative contamination (multiplicative)
Contamination raises RMS from 0.147 to 0.197. Linear methods (OLS, ISD-1) have limited leverage because the true model is non-linear; MCMC-combined uses the multiplicative weight convention and achieves by far the best field recovery.
Method |
RMS |
r |
|---|---|---|
obs |
0.1972 |
0.9361 |
ols |
0.1932 |
0.9381 |
elasticnet |
0.1942 |
0.9375 |
isd1 |
0.1932 |
0.9381 |
isd3 |
0.1906 |
0.9396 |
mcmc_add |
0.1932 |
0.9381 |
mcmc_comb (best) |
0.1686 |
0.9532 |
MCMC-combined is the only method that directly fits the multiplicative amplitudes \(b_i\). Its multiplicative weight \(w = 1/(1 + \hat{\boldsymbol{b}}\cdot\mathbf{t})\) correctly undoes the contamination model and achieves a 12% RMS improvement over the best linear method (ISD-3, RMS 0.1906). Among linear methods ISD-3 remains best, as its polynomial expansion provides some non-linear leverage.
Combined contamination (combined)
The most realistic scenario. Contamination raises RMS from 0.147 to 0.231 — the strongest degradation of the four scenarios. MCMC-combined is the only method that can disentangle both contamination terms simultaneously.
Method |
RMS |
r |
|---|---|---|
obs |
0.2314 |
0.9149 |
ols |
0.2023 |
0.9359 |
elasticnet |
0.2016 |
0.9362 |
isd1 |
0.2017 |
0.9361 |
isd3 |
0.2229 |
0.9267 |
mcmc_add |
0.2022 |
0.9360 |
mcmc_comb (best) |
0.1695 |
0.9486 |
MCMC-combined achieves the best field recovery by a large margin (RMS 0.1695 vs. 0.2016 for ElasticNet, a 16% improvement). By jointly sampling \(a_i\) and \(b_i\), it correctly disentangles the additive and multiplicative contamination that co-exist in this scenario. Among linear methods, ElasticNet remains best (RMS 0.2016); OLS, ISD-1, and MCMC-additive are within 0.05% of each other. ISD-3 degrades because its expanded polynomial cross-terms overfit at NSIDE = 32.
Cross-scenario summary
RMS field error (lower is better) for all methods across all four scenarios.
Pearson correlation with the true field (higher is better).
Amplitude Recovery
The MCMC posterior median amplitudes are compared against the ground truth for both additive (\(a_i\)) and multiplicative (\(b_i\)) parameters across the scenarios where those parameters are non-zero.
Estimated vs. true amplitudes. Error bars show the posterior standard deviation. Points on the dashed identity line indicate unbiased recovery.
Key observations:
Additive amplitudes \(a_i\) are recovered to within ≈ 0.02 in all scenarios where they are non-zero. The largest template (t₂, \(a_2 = -0.104\)) is recovered with a relative error < 10%.
Multiplicative amplitudes \(b_i\) are recovered with somewhat larger residuals (≈ 0.02–0.03), consistent with the weaker leverage that linear likelihoods have on non-linear contamination.
In the combined scenario, the additive posterior is essentially unbiased; the multiplicative posterior shows a mild bias (≈ 15%) for the largest amplitude \(b_2 = -0.195\), attributable to parameter degeneracy between \(a_i\) and \(b_i\) at moderate signal-to-noise.
Null Tests
After applying each method’s weights, the Pearson correlation between the pixel weights \(w(p)\) and each template \(t_i(p)\) is computed. A well-corrected field should show \(|r(w, t_i)| \approx 0\).
Null test correlations \(|r(w, t_i)|\) for all methods and scenarios. Dashed line at 0.10 marks the commonly used acceptance threshold.
Interpretation:
The
nonescenario shows \(|r| \approx 0.97\)–0.99 for all methods. This is expected — the templates are not correlated with the true field, and the weights are nearly uniform (≈ 1), so \(r(w, t)\) reflects the raw template auto-correlation structure, not residual contamination.After subtracting additive contamination (
additivescenario), \(|r|\) decreases to ≈ 0.69–0.73 for all methods, indicating that the templates are partially decorrelated from the weights.The multiplicative scenario maintains high \(|r| \approx 0.87\)–0.998 because linear methods cannot fully project out the multiplicative signal, leaving template-correlated residuals in the weights.
Note
The null test statistic \(r(w, t)\) used here measures correlation between pixel weights and templates, not correlation between the corrected field and templates. The former is the correct quantity for diagnosing whether the weights adequately down-weight contaminated pixels. Values close to 1 in the absence of contamination are not alarming — they indicate that the weights are nearly uniform and the templates are spatially coherent.
Template SNR Ranking
Three SNR estimators rank the templates by their contaminating power:
template — \(|\hat{\alpha}_i| / \sigma_{\hat{\alpha}_i}\) from OLS.
data — \(|\text{Corr}(\delta_\text{obs},\, t_i)|\).
peak — peak cross-spectrum \(\hat{C}_\ell^{\delta t_i}\) over noise.
Template SNR rankings for all three estimators across all four scenarios. Bar height encodes SNR; a consistent ranking across estimators indicates robustness.
Key findings:
In the
additiveandcombinedscenarios, template t₂ (amplitude \(|a_2| = 0.104\), the largest additive component) consistently ranks first or second across all three SNR estimators.In the
multiplicativescenario, the OLS-based SNR estimator assigns low rankings to all templates (near-zero \(\hat{\alpha}_i\)), correctly reflecting that the linear model has almost no leverage on the multiplicative signal. The data-correlation estimator is more informative here.In the
nonescenario, all SNR values are small, confirming that the estimators do not spuriously flag uncontaminated templates.
Interpretation and Recommendations
For purely additive contamination, MCMC-combined achieves the best field recovery (RMS 0.1545, r = 0.9645) even though the ground truth has \(b_i = 0\). Among linear methods, ISD-3 is marginally best (RMS 0.1669); OLS, ElasticNet, ISD-1, and MCMC-additive are all within 0.4% of each other.
For purely multiplicative contamination, MCMC-combined is the clear winner (RMS 0.1686, r = 0.9532), outperforming the best linear method (ISD-3, RMS 0.1906) by 12%. Linear methods have limited leverage on the non-linear contamination model.
For the realistic combined case, MCMC-combined achieves the best field recovery (RMS 0.1695, r = 0.9486 vs. RMS 0.2314 observed — a 27% improvement). It outperforms the best linear method (ElasticNet, RMS 0.2016) by 16%. ISD-3 overfits at NSIDE = 32.
MCMC-combined uses the multiplicative weight convention \(w(p) = 1/(1 + \hat{\boldsymbol{b}}\cdot\mathbf{t}(p))\) throughout. This is the physically motivated choice and delivers consistent gains over linear methods across all contaminated scenarios at 600 MCMC steps.
All six methods are called via
sm.run_decontamination()in both validation and production scripts, ensuring implementation consistency across the entire pipeline.Diagnostics (null tests, SNR ranking) correctly identify the contaminated scenarios and the most problematic templates, making them reliable flags for survey-quality assessment.
At NSIDE = 32, Poisson noise dominates at the pixel level (\(1/\sqrt{N_\text{mean}} \approx 0.14\) at 50 gal/pixel). Higher NSIDE or larger \(n_\text{mean}\) will reduce the noise floor and sharpen method comparisons. A full production run at NSIDE 64 with \(n_\text{mean} = 200\) is recommended for publication-level validation.
Validation outcome
The end-to-end validation script (scripts/run_validation.py) was run with
NSIDE = 32, 3 synthetic templates, 50 galaxies/pixel, 100 MCMC walkers,
600 steps, 100 burn-in, seed 42.
Results across four contamination scenarios:
None — all methods perform within 2 % of the uncontaminated baseline (RMS 0.147); ElasticNet matches it exactly by shrinking all amplitudes to zero.
Additive — contamination raises RMS from 0.147 to 0.194. All six methods correct effectively; MCMC-combined achieves the best recovery (RMS 0.1545, r = 0.9645) by also absorbing residual non-linear structure. ISD-3 is the best purely linear method (RMS 0.1669).
Multiplicative — contamination raises RMS from 0.147 to 0.197. MCMC-combined is the clear winner (RMS 0.1686, r = 0.9532), outperforming the best linear method (ISD-3, RMS 0.1906) by 12 % by directly fitting the multiplicative amplitudes \(b_i\).
Combined — strongest degradation: RMS rises from 0.147 to 0.231. MCMC-combined achieves RMS 0.1695 (r = 0.9486), a 27 % improvement over the uncorrected field and 16 % better than the best linear method (ElasticNet, RMS 0.2016). ISD-3 overfits at this resolution (RMS 0.2229).
All metric values match the JSON result files in
docs/_static/results_validation/.
Status: PASSED — all six methods ran to completion across all four scenarios; no numerical failures or divergences were detected.