sys_mapping.regression

Regression-based systematic decontamination: ElasticNet and Iterative Systematics Decontamination (ISD).

Inputs: Observed overdensity delta_g_obs (shape (n_pix,)), template maps delta_t (shape (n_sys, n_pix)).

Outputs: Amplitude vector alpha_hat, per-pixel weight array weights, and diagnostic metadata.

elasticnet_contamination_fit() — L1+L2 penalised regression; requires scikit-learn >= 1.3 (pip install sys_mapping[regression]).
iterative_systematics_decontamination() — polynomial OLS expansion, no extra dependencies.
method_comparison() — run multiple methods and return a unified result dict.

Key papers: Weaverdyck & Huterer 2021; Rodríguez-Monroy et al. 2025 — see also Methods Reference.

Regression-based systematic decontamination methods.

Implements:

ElasticNet regularised regression (Weaverdyck & Huterer 2021)
Iterative Systematics Decontamination via polynomial OLS (Rodríguez-Monroy+2025)
Multi-method comparison framework (Weaverdyck & Huterer 2021)
Unified run_decontamination() interface for all 6 methods

sys_mapping.regression.elasticnet_contamination_fit(delta_g_obs, delta_t, *, l1_ratio=0.5, alpha_reg=None, cv_folds=5, max_iter=10000, fit_intercept=False)[source]

Fit systematic contamination amplitudes via ElasticNet regression.

Minimises the penalised least-squares loss (Weaverdyck & Huterer 2021, Eq. 9):

\[\mathcal{L} = \frac{1}{2N_{\rm pix}} \left\|\delta_g - \sum_i\alpha_i\,t_i\right\|_2^2 + \lambda_1\|\boldsymbol\alpha\|_1 + \lambda_2\|\boldsymbol\alpha\|_2^2\]

where the total regularisation strength and L1 fraction are controlled by alpha_reg and l1_ratio.

Parameters:

delta_g_obs (ndarray) – Observed galaxy overdensity (shape (n_pix,)).
delta_t (ndarray) – Template maps (shape (n_sys, n_pix)).
l1_ratio (float) – ElasticNet mixing parameter: 1.0 = Lasso, 0.0 = Ridge.
alpha_reg (float | None) – Regularisation strength. If None, tuned automatically via cv_folds-fold cross-validation (ElasticNetCV).
cv_folds (int) – Number of cross-validation folds used when alpha_reg=None.
max_iter (int) – Maximum coordinate descent iterations.
fit_intercept (bool) – Whether to fit a constant offset. Typically False because the overdensity field has zero mean by construction.

Returns:

alpha_hat – Fitted contamination amplitudes (shape (n_sys,)).
weights – Per-pixel systematic weights w(p) = 1 / (1 + alpha_hat · t(p)) (shape (n_pix,)).
cv_info – Dictionary with keys alpha_reg (used value), l1_ratio, and cv_scores (array of cross-validation MSE scores, or None if alpha_reg was provided explicitly).

Return type:

tuple[ndarray, ndarray, dict]

Notes

Requires scikit-learn >= 1.3. Install via:

pip install "sys_mapping[regression]"

The per-pixel weight corresponds to the DES-Y1 / Weaverdyck+2021 form \(w(p) = (1 + \hat{\boldsymbol\alpha}\cdot\mathbf{t}(p))^{-1}\).

Examples

>>> import numpy as np
>>> from sys_mapping import elasticnet_contamination_fit
>>> rng = np.random.default_rng(42)
>>> n_pix, n_sys = 5000, 3
>>> delta_t = rng.standard_normal((n_sys, n_pix))
>>> true_alpha = np.array([0.2, -0.1, 0.05])
>>> delta_g = true_alpha @ delta_t + rng.standard_normal(n_pix) * 0.3
>>> alpha_hat, weights, info = elasticnet_contamination_fit(
...     delta_g, delta_t, alpha_reg=1e-4)
>>> weights.shape
(5000,)
>>> np.all(weights > 0)
True

References

Weaverdyck & Huterer 2021, MNRAS 503, 5061.

sys_mapping.regression.iterative_systematics_decontamination(delta_g_obs, delta_t, *, poly_order=3, max_iter=20, tol=1e-05, lambda_poly=0.0)[source]

Iterative Systematics Decontamination (ISD) via polynomial OLS.

Expands templates to include cross-products up to poly_order, then iteratively fits OLS weights until convergence. Implements the approach of Rodríguez-Monroy et al. 2025 (DES Y6), Sec. 3.2.

The ISD cleansed overdensity after convergence is:

\[\delta_g^{\rm clean}(p) = \frac{\delta_g(p) - f_{\rm add}(p)}{1 + f_{\rm mult}(p)}\]

where \(f_{\rm add}\) and \(f_{\rm mult}\) are the additive and multiplicative systematic components estimated from polynomial template regression (here approximated by a single linear OLS pass per iteration).

Parameters:

delta_g_obs (ndarray) – Observed galaxy overdensity (shape (n_pix,)).
delta_t (ndarray) – Template maps (shape (n_sys, n_pix)).
poly_order (int) – Maximum polynomial order for template expansion. Order 1 = linear templates only; order 2 adds pairwise products; order 3 adds triple products. Automatically reduced if the expanded template matrix is underdetermined.
max_iter (int) – Maximum number of OLS iterations.
tol (float) – Convergence tolerance on the relative change in weights (||w_new - w_old|| / ||w_old||).
lambda_poly (float) – Ridge penalty applied exclusively to the polynomial (non-linear) expansion columns — the first n_sys linear columns are never penalised. A small positive value (e.g. 1e-3 * np.var(delta_g_obs)) prevents the polynomial terms from absorbing signal that belongs to the linear coefficients, which are the only ones used for the weight update. Defaults to 0.0 (plain OLS, backward-compatible).

Returns:

weights – Per-pixel systematic weights (shape (n_pix,)), all positive.
alpha_hat_all – Fitted coefficients for the expanded template matrix at the final iteration (shape (n_expanded,)).
n_iterations – Number of iterations performed (1 if converged on first pass).

Return type:

tuple[ndarray, ndarray, int]

Notes

Template expansion uses itertools.combinations_with_replacement — no new dependencies. For n_sys=5, poly_order=3 the expanded set has \(\binom{n+k}{k} - 1 = 55\) columns. If n_pix < 10 × n_expanded, poly_order is reduced with a warning.

Speed: the design matrix X is constant across iterations; only the diagonal weight matrix changes. The per-iteration cost is therefore one BLAS-3 call X^T diag(w) X of shape (n_expanded, n_expanded) plus a small lstsq solve — roughly 5–15× faster than SVD of the full (n_pix × n_expanded) matrix that the previous sqrt-weight formulation required.

Examples

>>> import numpy as np
>>> from sys_mapping import iterative_systematics_decontamination
>>> rng = np.random.default_rng(0)
>>> n_pix, n_sys = 8000, 3
>>> delta_t = rng.standard_normal((n_sys, n_pix))
>>> true_alpha = np.array([0.15, -0.08, 0.04])
>>> delta_g = true_alpha @ delta_t + rng.standard_normal(n_pix) * 0.4
>>> weights, alpha_all, n_it = iterative_systematics_decontamination(
...     delta_g, delta_t, poly_order=2, max_iter=10)
>>> weights.shape
(8000,)
>>> np.all(weights > 0)
True

References

Rodríguez-Monroy et al. 2025, arXiv:2509.07943.

sys_mapping.regression.method_comparison(delta_g_obs, delta_t, good_pixels, nside, methods=('combined_mcmc', 'additive_mcmc', 'elasticnet', 'ols'), seed=42, mcmc_kwargs=None)[source]

Run multiple decontamination methods on the same data and compare.

Implements the multi-method comparison framework of Weaverdyck & Huterer 2021 (Sec. 4). Each method returns coefficients, per-pixel weights, and a goodness-of-fit metric, enabling direct comparison.

Parameters:

delta_g_obs (ndarray) – Observed overdensity (shape (n_pix,)).
delta_t (ndarray) – Template maps (shape (n_sys, n_pix)).
good_pixels (ndarray) – Boolean mask (shape (n_full_pix,)). Passed through for spatial context but not used in computation directly (templates already masked).
nside (int) – HEALPix NSIDE of the full map.
methods (tuple[str, ...]) – Subset of {"combined_mcmc", "additive_mcmc", "elasticnet", "ols"}. "combined_mcmc" (the default first entry) is the recommended method; it infers both additive and multiplicative amplitudes jointly.
seed (int) – Random seed for MCMC initialisation.
mcmc_kwargs (dict | None) – Keyword arguments forwarded to run_mcmc() for the "additive_mcmc" method.

Returns:

Dictionary keyed by method name. Each value is a dict with:

alpha_hat or a_hat: contamination amplitudes
weights: per-pixel weights (shape (n_pix,))
chi2_residual: sum-of-squared residuals / n_pix

Return type:

results

Notes

"additive_mcmc" requires no optional dependencies (uses the built-in MCMC pipeline). "elasticnet" requires scikit-learn.

Examples

>>> import numpy as np
>>> from sys_mapping import method_comparison
>>> rng = np.random.default_rng(0)
>>> n_pix, n_sys = 3000, 2
>>> delta_t = rng.standard_normal((n_sys, n_pix))
>>> delta_g = 0.1 * delta_t[0] + rng.standard_normal(n_pix) * 0.5
>>> good = np.ones(n_pix, dtype=bool)
>>> results = method_comparison(delta_g, delta_t, good, nside=16,
...                             methods=("elasticnet", "ols"))
>>> set(results.keys()) == {"elasticnet", "ols"}
True
>>> "weights" in results["ols"]
True

References

Weaverdyck & Huterer 2021, MNRAS 503, 5061.

sys_mapping.regression.run_decontamination(method, delta_g_obs, delta_t, *, n_walkers=100, n_steps=1200, n_burn=200, seed=42, use_skewed=False, progress=False, cv_folds=5, isd_max_iter=50, isd_lambda_poly=0.0)[source]

Run a single decontamination method and return a standardised result dict.

This is the unified entry point used by both validation scripts and production pipelines. All 6 methods share the same interface so callers can iterate over a list of method names without any method-specific boilerplate.

Parameters:

method (str) – One of "OLS", "ElasticNet", "ISD-1", "ISD-3", "MCMC-add", "MCMC-comb".
delta_g_obs (ndarray) – Observed galaxy overdensity at unmasked pixels, shape (n_pix,).
delta_t (ndarray) – Template maps in the original (non-rotated) basis, shape (n_sys, n_pix). MCMC methods rotate internally.
n_walkers (int) – MCMC sampler settings.
n_steps (int) – MCMC sampler settings.
n_burn (int) – MCMC sampler settings.
seed (int) – Random seed for MCMC walker initialisation.
use_skewed (bool) – Use skew-normal likelihood for MCMC-comb. Ignored for all other methods.
progress (bool) – Show emcee progress bar (MCMC methods only).
cv_folds (int) – Cross-validation folds for ElasticNet (when alpha_reg is auto).
isd_max_iter (int) – Maximum iterations for ISD methods.
isd_lambda_poly (float) – Ridge penalty on the polynomial expansion columns for ISD-3 (see iterative_systematics_decontamination()). A value of 1e-3 * np.var(delta_g_obs) is recommended when ISD-3 converges to implausible solutions. Has no effect for ISD-1 (poly_order=1 has no polynomial-only columns). Default 0.0 (plain OLS).

Returns:

dict with keys
All methods – a_hat (n_sys,), b_hat (n_sys,), weights (n_pix,), elapsed_s (float).
MCMC methods additionally – flat_chain, sampler, a_rot, b_rot, cov_a_rot, cov_b_rot, cov_a, cov_b, R, acceptance_fraction, sigma_hat.
ISD methods additionally – n_iterations (int).
ElasticNet additionally – cv_info (dict).
Non-applicable keys are None.

Return type:

dict

Notes

Weight convention

OLS / ElasticNet / ISD-1 / ISD-3 / MCMC-add: w(p) = 1 / (1 + a_hat @ t(p)) — additive weight.
MCMC-comb: w(p) = (1 + δ_g_clean(p)) / (1 + δ_g_obs(p)) — exact pixel-level inverse of the contamination weight; cancels weight_cont exactly when (a_hat, b_hat) equal the true parameters.

References

Berlfein et al. 2024, MNRAS 531, 4954. Weaverdyck & Huterer 2021, MNRAS 503, 5061. Rodríguez-Monroy et al. 2025, arXiv:2509.07943.