Methods Reference
This page maps every method implemented in sys_mapping to its source
publication, provides the key equations with all notation spelled out, and
gives physical motivation and practical guidance on when to use each method.
For notation conventions see Overview; for bibliographic detail on each
reference see Bibliography.
Core contamination model
Reference: Berlfein et al. 2024 (MNRAS 531, 4954), Eq. 11–13.
Module: sys_mapping.contamination
Physical motivation. A photometric galaxy survey assigns each galaxy to a HEALPix pixel based on its observed position. The measured number count \(n_g(p)\) per pixel can differ from the true number count \(n_g^{\rm true}(p)\) for two physically distinct reasons:
Additive contamination: a contaminant contributes galaxy-like objects to the catalog, or masks real galaxies over a spatially coherent pattern. Examples: stellar contamination (stars misclassified as galaxies add to counts), fibre allocation inefficiency (sparse regions are under-observed). These shift \(\delta_g\) by a fixed amount proportional to \(t_i\).
Multiplicative contamination: a varying selection efficiency scales galaxy counts up or down relative to the true density. Examples: depth variations (seeing, sky background, airmass) modulate the survey completeness as a function of position. These multiply \(1+\delta_g\) by a factor proportional to \(1 + b_i t_i\).
The combined forward model is:
where \(\delta_g(p) = n_g^{\rm true}(p)/\bar{n}_g - 1\) is the true overdensity, \(\hat\delta_g(p)\) is the observed overdensity, and \(t_i(p)\) is systematic template \(i\) normalised to zero mean and unit variance over the footprint. The amplitudes \(a_i\) (additive) and \(b_i\) (multiplicative) are the free parameters to be inferred.
Three nested models are obtained by constraining these amplitudes:
Model |
Free parameters |
Constraint |
Use case |
|---|---|---|---|
|
\(a_i\), \(\sigma\) |
\(b_i = 0\) |
Stellar contamination, photometric bias |
|
\(a_i\), \(\sigma\) |
\(b_i = a_i\) |
Depth / completeness variations |
|
\(a_i, b_i\), \(\sigma\) |
none |
Unknown contamination type; general case |
Inverting the model. The clean overdensity can be recovered from the observed one via:
This inversion is valid only when \(1 + \sum_i b_i t_i(p) > 0\) in every pixel (i.e. the multiplicative modulation never entirely suppresses the galaxy density). For typical values \(|b_i| \lesssim 0.3\) and unit-normalised templates this condition is satisfied almost everywhere.
Weight-based correction for linear methods. For all additive-type methods (OLS, ElasticNet, ISD, MCMC-additive), the corrected field is computed as (Weaverdyck & Huterer 2020, Eq. 46-47):
This is exact for multiplicative contamination and equals the simpler subtraction \(\hat\delta_g - \hat{\boldsymbol\alpha}\cdot\mathbf{t}\) at first order in \(\hat{\boldsymbol\alpha}\cdot\mathbf{t}\). Using the weight form preserves positivity (\(1 + \delta_g^{\rm corr} > 0\)) and is the convention in the reference literature (Weaverdyck & Huterer 2020; Rodríguez-Monroy et al. 2025); it is applied uniformly throughout this package. For MCMC-combined the full inversion formula above is used instead.
Technical implementation.
All arithmetic is performed on JAX arrays (jax.Array) to enable
JIT compilation and automatic differentiation. The forward and inverse
models are implemented as pure functions with no mutable state:
Forward model (apply_contamination()):
mult_term = jnp.einsum("i,ij->j", b, delta_t) # Σ b_i t_i(p)
add_term = jnp.einsum("i,ij->j", a, delta_t) # Σ a_i t_i(p)
delta_g_obs = delta_g * (1.0 + mult_term) + add_term
Both Einstein sums are single BLAS-1 operations of cost
\(O(n_s \times N)\). At n_pix = 10\,000, n_sys = 5 the
forward call runs in approximately 830 μs after JIT warm-up
(float32 accumulation; max absolute roundtrip error < 1e-5).
Inverse model (invert_contamination()):
adds an element-wise division (≈ 1 730 μs at the same size).
Parameter vector layout (pack_params(),
unpack_params()):
the flat float64 vector fed to emcee / scipy optimisers has layout
[a_0, …, a_{n_s-1}, ← always present
b_0, …, b_{n_s-1}, ← combined model only
sigma, ← always
gamma] ← skew-normal only
Slice operations preserve values to machine precision. The
match/case dispatch (Python 3.10+) has zero runtime cost.
Functions:
apply_contamination() (forward model),
invert_contamination() (recover clean field),
pack_params() / unpack_params()
(parameter vector layout),
n_free_params(),
compute_two_point_correction()
Gaussian and skew-normal likelihood
Reference: Berlfein et al. 2024, Eq. 17–18.
Module: sys_mapping.likelihood
Physical motivation. To infer the contamination amplitudes \((a_i, b_i)\) from data, we need a likelihood model for the distribution of the clean overdensity field \(\delta_g(p)\). At the pixel level (with \(\bar{n} \gtrsim 30\) galaxies per pixel), Poisson fluctuations in the count map are approximately Gaussian. The central limit theorem ensures that the overdensity in each pixel approaches a Gaussian as \(\bar{n} \to \infty\).
Gaussian likelihood. Assuming \(\delta_g(p) \stackrel{\rm iid}{\sim} \mathcal{N}(0, \sigma^2)\) for each unmasked pixel \(p\), the log-likelihood is (Eq. 17):
Here \(\boldsymbol\Theta = (a_1,\ldots,a_{n_s},b_1,\ldots,b_{n_s},\sigma)\), \(N\) is the number of unmasked pixels, and the Jacobian term accounts for the change of variable from \(\hat\delta_g\) to \(\delta_g\). For the additive model (\(b_i=0\)) the Jacobian term vanishes.
Skew-normal likelihood. For a lognormal galaxy field, \(\delta_g = e^{G-\sigma_G^2/2}-1\) has positive skewness \(s \approx 3\sigma_G + \sigma_G^3 > 0\). Fitting a Gaussian likelihood to a positively-skewed field introduces a systematic bias in the inferred amplitudes. The skew-normal extension (Eq. 18) corrects for this by adding a log-CDF term:
where \(\Phi(z) = \tfrac{1}{2}[1+{\rm erf}(z/\sqrt{2})]\) is the standard-normal CDF, \(\gamma > 0\) is the skewness shape parameter (fitted alongside \(a_i, b_i, \sigma\)), and \(\xi = -\sigma\delta\sqrt{2/\pi}\) with \(\delta = \gamma/\sqrt{1+\gamma^2}\) is a mean-shift that ensures \(\mathbb{E}[\delta_g]=0\) is maintained. Setting \(\gamma=0\) recovers the Gaussian exactly.
Technical implementation.
make_log_likelihood() is a factory: it
captures n_sys, model, and use_skewed as closed-over Python
integers/booleans that become compile-time constants during JAX tracing.
The returned function is decorated with @jax.jit and compiled on the
first call (~**227 ms** compile latency; reuse the function object across
all MCMC iterations).
Gaussian path (use_skewed=False):
mult_term = jnp.einsum("i,ij->j", b, delta_t)
add_term = jnp.einsum("i,ij->j", a, delta_t)
log_jac = jnp.sum(jnp.log(jnp.abs(1.0 + mult_term))) # Jacobian
delta_g_clean = (delta_g_obs - add_term) / (1.0 + mult_term)
log_gauss = -0.5*N*log(2πσ²) - (1/2σ²)*Σ delta_g_clean²
return log_gauss - log_jac
Execution time after JIT warm-up: ~284 μs/call at
\(N = 10\,000\), \(n_s = 5\) on CPU.
At zero contamination, matches scipy.stats.norm.logpdf sum to
rtol = 1e-5.
Skew-normal path (use_skewed=True):
delta_param = gamma / jnp.sqrt(1.0 + gamma**2) # δ = γ/√(1+γ²)
xi = -sigma * delta_param * jnp.sqrt(2.0 / jnp.pi)
residual = delta_g_clean - xi
z = gamma * residual / sigma
log_skew = N*log(2) + jnp.sum(log_ndtr(z)) # Σ log Φ(γ r/σ)
return log_gauss + log_skew - log_jac
The log-CDF \(\ln\Phi(z)\) is evaluated via
jax.scipy.special.log_ndtr, which is numerically stable for large
negative \(z\) (where log(1+erf(z/√2)) would underflow).
Execution time: ~592 μs/call (extra log_ndtr sum over \(N\)
pixels). Setting gamma = 0 recovers the Gaussian to atol = 1e-6.
Function:
make_log_likelihood()
MCMC inference
Reference: Berlfein et al. 2024, Sec. 4; Goodman & Weare 2010 (emcee sampler).
Module: sys_mapping.inference
Physical motivation. Given the likelihood \(\mathcal{L}(\boldsymbol\Theta)\) and a prior \(\pi(\boldsymbol\Theta)\), Bayes’ theorem gives the posterior:
The prior on each \(a_i\) and \(b_i\) is a zero-mean Gaussian with unit variance, \(a_i \sim \mathcal{N}(0, 1)\), reflecting the expectation that contamination amplitudes are at most of order unity. The noise parameter \(\sigma > 0\) has a half-normal prior. In the rotated template basis (see Template PCA rotation) the prior factorises across parameters, improving MCMC mixing.
sys_mapping uses the affine-invariant ensemble sampler
(emcee, Foreman-Mackey et al. 2013).
This sampler uses \(n_{\rm walkers}\) parallel chains (walkers) that
share information via stretch and walk moves. Key properties:
Works efficiently for posteriors with correlated parameters.
Automatically adapts to the posterior shape.
The only tuning parameter is the number of walkers (recommendation: \(n_{\rm walkers} \geq 4\,n_{\rm dim}\), typical choice \(n_{\rm walkers} = 150\) for \(n_{\rm dim} = 7\)).
The MAP estimate is taken as the posterior median of the flat chain (after discarding the burn-in phase). The posterior variance is estimated from the chain covariance matrix.
Recommended settings:
Parameter |
Default / recommendation |
|---|---|
|
150 (>= 4 x n_dim) |
|
1500 (beyond convergence) |
|
300 (discard early transient) |
Convergence check |
|
Technical implementation.
Log-probability wrapper (make_log_prob()):
converts the JAX log-likelihood into an emcee-compatible Python callable.
delta_g_obs and delta_t are converted to jnp.asarray once at
construction time to avoid repeated host–device copies. A hard
sigma <= sigma_min (default 1e-6) guard returns -inf
without calling JAX, short-circuiting the dispatch overhead.
Vectorised mode (vectorize=True): builds a single
jax.jit(jax.vmap(log_likelihood)) kernel that accepts the full
(n_walkers, n_dim) position matrix and returns (n_walkers,)
log-probabilities in one GPU dispatch per step. This is the primary
source of GPU speedup and requires emcee >= 3.0.
Walker initialisation in run_mcmc():
p0[:, :n_cont] = rng.normal(0.0, 1e-3, (n_walkers, n_cont))
p0[:, n_cont] = std(delta_g_obs) * (1 + rng.normal(0.0, 0.01, n_walkers))
# gamma (skew) initialised to N(0, 0.1) when use_skewed=True
The \(10^{-3}\) scatter in contamination amplitudes places walkers
near zero (the prior centre) while providing enough diversity to explore
the posterior within the first ~50 steps. sigma is initialised near
the observed overdensity standard deviation, which is a good prior for the
noise level.
Point estimate: get_mle_params() returns
np.median(flat_chain, axis=0) — a robust estimator that is
insensitive to outlier walkers and converges to the true parameter at
\(O(1/\sqrt{n_{\rm samples}})\).
Covariance (get_param_covariance_from_chain()):
slices the flat chain by column rather than unpacking sample by sample:
a_arr = flat_chain[:, :n_sys] # (n_samples, n_sys)
b_arr = flat_chain[:, n_sys : 2*n_sys] # combined only
cov_a = np.cov(a_arr, rowvar=False) # (n_sys, n_sys)
The full covariance (not just the diagonal) is needed to correctly
propagate uncertainty through the PCA back-transformation
cov_orig = R.T @ cov_rot @ R.
Performance. Total wall time scales as
\(O(n_{\rm walkers} \times n_{\rm steps} \times N_{\rm pix} \times n_s)\).
A typical run with n_walkers = 250, n_steps = 1500,
\(N_{\rm pix} \approx 50\,000\), \(n_s = 5\) takes a few minutes on CPU;
the JAX likelihood is compiled once (~227 ms) then runs at ~284 μs/eval.
Functions:
make_log_prob(),
run_mcmc(),
get_mle_params(),
get_param_variance_from_chain(),
get_param_covariance_from_chain()
Template PCA rotation
Reference: Berlfein et al. 2024, Appendix A.
Module: sys_mapping.correction
Physical motivation. Systematic template maps are often correlated with each other (e.g. seeing correlates with depth because long exposures under good seeing produce deeper images). When templates are correlated, the contamination amplitudes \(a_i\) cannot be determined independently — the likelihood has ridges in parameter space, MCMC chains mix slowly, and the inferred amplitudes in the original basis are unreliable.
PCA diagonalisation. The template covariance matrix is:
(a symmetric, positive semi-definite \(n_s \times n_s\) matrix). Its eigendecomposition is \(C = V D V^\top\) where \(V\) is the \(n_s \times n_s\) orthogonal matrix of eigenvectors and \(D = \mathrm{diag}(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_{n_s})\) contains the eigenvalues (template variances in the rotated basis).
The rotated templates are:
so that \(C' = V^\top C V = D\) is diagonal. The rotated templates are orthogonal by construction:
In the rotated basis, the likelihood factorises across parameters and MCMC convergence is faster. The rotation is applied before MCMC and the back-transformation is applied after:
The eigenvalues \(\lambda_j\) measure the variance of each rotated template and indicate how many independent systematic modes are present. Large gaps in the eigenvalue spectrum suggest that some templates are essentially redundant.
Technical implementation.
rotate_templates() performs the following
steps on a (n_sys, n_pix) NumPy array:
cov = delta_t @ delta_t.T / n_pix # (n_sys, n_sys), O(n_sys² × n_pix)
eigenvalues, eigenvectors = np.linalg.eigh(cov) # symmetric → faster than eig
idx = np.argsort(eigenvalues)[::-1] # descending order
rotation_matrix = eigenvectors[:, idx].T # V^T, rows are eigenvectors
delta_t_rot = rotation_matrix @ delta_t # (n_sys, n_pix)
numpy.linalg.eigh exploits the symmetry and positive semi-definiteness
of the covariance matrix (LAPACK dsyevd driver), making it faster and
more accurate than the general eig decomposition. Cost:
\(O(n_s^3 + n_s \times N)\). At n_sys = 5, n_pix = 10\,000
the call takes approximately 177 μs.
Numerical guarantees:
Off-diagonal elements of the rotated covariance
delta_t_rot @ delta_t_rot.T / n_pixare \(< 10^{-10}\) (orthogonality).Rotation matrix satisfies
R @ R.T = Itoatol < 1e-12.Sum of eigenvalues equals
trace(cov)tortol < 1e-10(variance conservation).
transform_params_from_rotated() applies the
inverse transformation a_orig = V @ a_rot where V = R.T:
V = rotation_matrix.T # (n_sys, n_sys)
a_orig = V @ a_rot # two small (n_sys, n_sys) × (n_sys,) products
b_orig = V @ b_rot
Cost: two \(n_s \times n_s\) matrix–vector multiplies (~4 μs).
Roundtrip precision (rotate → infer → back-transform) is atol < 1e-12
in float64.
In run_decontamination() (MCMC methods),
the covariance matrix is propagated back as:
cov_a = R.T @ cov_a_rot @ R # (n_sys, n_sys) sandwich product
so that uncertainty in the rotated basis is correctly mapped to the original template basis without approximation.
Functions:
rotate_templates(),
transform_params_from_rotated()
Noise debiasing
Reference: Berlfein et al. 2024, Eq. 21.
Module: sys_mapping.correction
Physical motivation. When the true contamination amplitude \(a_i\) is small (close to zero), the posterior median \(\hat{a}_i\) fluctuates around zero due to noise. The squared amplitude \(\hat{a}_i^2\) is therefore always positive even when the true amplitude is zero, and the expectation is:
This is the classical noise bias familiar from power spectrum estimation (where the noise power spectrum must be subtracted from the measured pseudo- \(C_\ell\)).
Debiased estimator. The noise-debiased squared amplitude is:
and analogously:
The \(\max(\cdot, 0)\) clip enforces the physical constraint \(a_i^2 \geq 0\) (squared amplitudes are non-negative). When the true amplitude is zero, the expected debiased estimate is zero on average.
The debiased amplitudes \(\tilde{a}_i^2, \tilde{b}_i^2\) are the inputs to the two-point correction formula (see below). Using the raw (biased) squared amplitudes would over-subtract the template auto-correlation, introducing a spurious dip in the corrected \(w(\theta)\).
Technical implementation.
debias_params() is a pure NumPy function
operating on (n_sys,) float64 arrays:
a_sq = np.maximum(a_hat**2 - var_a, 0.0) # clip at 0 (squared amp ≥ 0)
b_sq = np.maximum(b_hat**2 - var_b, 0.0)
Cost: two element-wise squares, two subtractions, two np.maximum calls —
approximately 4 μs regardless of \(n_s\). When the exact posterior
variance is supplied, the debiased estimate recovers the true
\(a_i^2\) to atol < 1e-10 (float64 round-off limited).
The var_a and var_b inputs are the diagonal of the MCMC posterior
covariance returned by
get_param_variance_from_chain(). In the
PCA-rotated basis the off-diagonal covariance elements are small (templates
are orthogonal after rotation), so the diagonal approximation is accurate.
Function:
debias_params()
Two-point function correction
Reference: Berlfein et al. 2024, Eq. 15–16.
Module: sys_mapping.correction
Derivation sketch. The angular two-point correlation function of the observed field is:
Inserting the contamination model and expanding to first order in \(a_i^2\) and \(b_i^2\) (all cross-terms between different templates \(i \neq j\) vanish on average because templates are uncorrelated with \(\delta_g\) and with each other after normalisation):
where \(w_{t_i t_i}(\theta) = \langle t_i(\hat{n})\,t_i(\hat{n}')\rangle_\theta\) is the template auto-correlation at angular separation \(\theta\).
Solving for \(w_g(\theta)\) and replacing \(a_i^2 \to \tilde{a}_i^2\), \(b_i^2 \to \tilde{b}_i^2\) (debiased estimates):
When to use. Apply this correction after measuring the raw
\(w(\theta)\) from the galaxy catalog and the template auto-correlations
\(w_{t_i t_i}(\theta)\) from the templates. The correction requires
the debiased amplitudes from debias_params().
The corrected \(\hat{w}_{\rm corr}(\theta)\) is an unbiased estimate
of \(w_g(\theta)\) to first order in the contamination amplitudes.
For the harmonic-space (power spectrum) equivalent, see
correct_power_spectrum_harmonic().
Technical implementation.
compute_two_point_correction() operates
on JAX arrays and is the inner kernel:
add_bias = jnp.einsum("i,ij->j", a_sq, template_correlations) # (n_bins,)
mult_bias = jnp.einsum("i,ij->j", b_sq, template_correlations) # (n_bins,)
w_corr = (w_obs - add_bias) / (1.0 + mult_bias)
Two Einstein sums over (n_sys, n_bins) produce (n_bins,) vectors;
cost is \(O(n_s \times n_{\rm bins})\) — negligible. Execution time:
approximately 576 μs at n_sys = 5, n_bins = 15 (dominated by
JAX dispatch overhead). Matches the analytic formula to rtol = 1e-6
(single float32 einsum).
The convenience wrapper
correct_two_point_function() chains the
NumPy debiasing step with the JAX correction kernel and converts the result
back to a NumPy array:
a_sq, b_sq = debias_params(a_hat, b_hat, var_a, var_b) # NumPy, ~4 μs
w_corr = compute_two_point_correction( # JAX, ~576 μs
jnp.asarray(w_obs), jnp.asarray(a_sq),
jnp.asarray(b_sq), jnp.asarray(template_correlations))
return np.asarray(w_corr) # back to NumPy
The template_correlations array (shape (n_sys, n_bins)) is the
auto-correlation \(w_{t_i t_i}(\theta)\) of each template map,
measured with the same angular binning as the galaxy \(w(\theta)\).
Functions:
correct_two_point_function(),
correct_power_spectrum_harmonic()
Likelihood ratio test (model selection)
Reference: Berlfein et al. 2024, Eq. 19; Wilks 1938 (asymptotic theorem).
Module: sys_mapping.model_selection
Physical motivation. Given data, we want to know whether the multiplicative contamination terms \(b_i\) are statistically necessary, or whether the simpler additive model with \(b_i = 0\) provides an adequate description. The LRT is the standard frequentist tool for comparing nested models.
The LRT statistic. For two nested models \(\mathcal{M}_0 \subset \mathcal{M}_1\) with MLEs \(\hat{\Theta}_0\) and \(\hat{\Theta}_1\):
where \(k_0, k_1\) are the numbers of free parameters in each model. For the additive (\(n_s + 1\)) vs combined (\(2n_s + 1\)) comparison, \(r = n_s\).
Interpretation. A large \(\lambda_{\rm LR} \gg \chi^2_{r,0.95}\) (the 95th percentile of the \(\chi^2(r)\) distribution) means the combined model is strongly preferred — the data require multiplicative contamination. A small \(\lambda_{\rm LR} \approx 0\) means the simpler additive model is sufficient.
Note on the sign convention. The LRT statistic is always \(\lambda_{\rm LR} \geq 0\) because adding parameters cannot decrease the maximised log-likelihood. If the two likelihoods are nearly equal, \(\lambda_{\rm LR} \approx 0\) and the data cannot distinguish the models.
Technical implementation.
The LRT uses the same MCMC posterior median as the point estimate for both
models. The test statistic is computed by evaluating
make_log_likelihood() at the two parameter
vectors and differencing:
lrt = 2.0 * (log_lik_combined(theta_1) - log_lik_additive(theta_0))
Both log-likelihood calls use the JIT-compiled function objects, so the
evaluation cost is two forward passes through the likelihood
(~284 μs each). The p-value is obtained from the \(\chi^2(r)\)
CDF via scipy.stats.chi2.sf(lrt, df=r) where
r = n_sys for the additive-vs-combined comparison.
Because the MCMC posterior median — not a true MLE from gradient
optimisation — is used as the parameter estimate, the LRT statistic is a
conservative approximation: the posterior median is inside the posterior
but may not be exactly at the likelihood maximum, so
\(\lambda_{\rm LR}\) may be slightly underestimated. In practice,
for well-sampled posteriors (n_steps >= 1500) the bias is
negligible.
Function:
likelihood_ratio_test()
Pseudo-\(C_\ell\) power spectrum
References: Elsner, Leistedt & Peiris 2016 (MNRAS 456, 2095), Sec. 2–3; Elsner, Leistedt & Peiris 2017 (MNRAS 465, 1847); Leistedt & Peiris 2014 (MNRAS 444, 2); Ho et al. 2012 (ApJ 761, 14).
Module: sys_mapping.power_spectrum
Physical motivation. Instead of working with the real-space two-point function \(w(\theta)\), one can characterise galaxy clustering in harmonic space via the angular power spectrum \(C_\ell\). Each multipole \(\ell\) probes a characteristic angular scale \(\theta \sim \pi/\ell\) (e.g. \(\ell = 100\) corresponds to \(\approx 1.8°\)). Harmonic-space estimators allow easy comparison with theoretical predictions computed via Boltzmann codes (CAMB, CLASS).
Pseudo-:math:`C_ell` estimator. Because the galaxy survey covers only a fraction of the sky, spherical harmonic modes are mixed by the survey mask \(W(p)\). The pseudo-:math:`C_ell` estimator simply applies the mask and computes the power spectrum of the masked map:
where \(Y_{\ell m}(p)\) are the spherical harmonic basis functions and
\(\Omega_p = 4\pi/N_{\rm pix}\) is the pixel solid angle.
In sys_mapping this is computed by passing the masked map to
healpy.anafast.
The pseudo-\(C_\ell\) is related to the true power spectrum \(C_\ell\) via the MASTER equation (Mode-coupling Matrix in Coupled Harmonics for Angular Spectra Techniques, Hivon et al. 2002):
where the mode-coupling matrix \(M_{\ell\ell'}\) depends only on the mask power spectrum \(\tilde{C}_\ell^{WW}\).
Template subtraction (TS) in harmonic space (Elsner+2016, Eq. 1–2). Systematic contamination adds a spurious contribution to the measured pseudo-\(C_\ell\). For an additive contaminant with amplitude \(\hat\alpha_i\), the cleaned power spectrum is:
where \(\hat{C}_\ell^{dd}\) is the measured pseudo-\(C_\ell\) of the galaxy map and \(\hat{C}_\ell^{t_i t_i}\) is the pseudo-\(C_\ell\) of template \(i\).
Harmonic bias (Elsner+2016, Eq. 8). Template subtraction introduces a residual additive bias:
This arises because subtracting a linear combination of templates removes \(n_s\) degrees of freedom from the sky, reducing the effective number of modes at each \(\ell\) by \(n_s\). At low multipoles (\(\ell \lesssim 10\)) and with many templates (\(n_s \gtrsim 10\)) this bias can be significant and must be corrected.
Mode Projection (BMP/EMP). An alternative to template subtraction is to project out the template modes from the covariance matrix before power spectrum estimation. In the Basic Mode Projection (BMP), the pixel-pixel covariance matrix is modified to marginalise over the template amplitudes. The Extended Mode Projection (EMP, Leistedt+2014) additionally discards multipoles where the template signal exceeds a threshold \(k\) times the noise, with an analytical bias correction.
Technical implementation.
Pseudo-Cℓ measurement (measure_pseudo_cl()):
masked_map = delta_g * mask_float # element-wise multiplication
cl = hp.anafast(masked_map, lmax=lmax,
use_pixel_weights=True) # healpy spherical harmonic transform
ell = np.arange(len(cl))
The mask is applied as a float weight map (True → 1.0, False → 0.0).
use_pixel_weights=True enables healpy’s pixel-area weighting scheme, which
corrects for the geometric distortion of the HEALPix equal-area pixels at
high Galactic latitude and improves accuracy at \(\ell \gtrsim N_{\rm side}\).
The returned pseudo_cl is the power of the masked map and must be divided
by \(W_2 = N_{\rm pix}^{-1}\sum_p W^2(p)\) (the mask power) for a
first-order sky-fraction correction.
Template subtraction (subtract_template_cl()):
calls hp.anafast once per template to compute
\(\tilde{C}_\ell^{t_i t_i}\), then subtracts the weighted contribution:
cl_cleaned = pseudo_cl.copy()
for a_i, t_i in zip(alpha, delta_t):
masked_template = t_i * mask_float
cl_t = hp.anafast(masked_template, lmax=lmax, use_pixel_weights=True)
cl_cleaned -= a_i * cl_t
Cost: \(O(n_s)\) harmonic transforms. Each transform has complexity
\(O(N_{\rm pix} \log N_{\rm pix})\) (the healpy SHT).
Harmonic bias correction (harmonic_bias()):
a closed-form expression requiring no iteration:
bias = -n_templates / (2.0 * ell + 1.0) # (n_ell,)
Mode Projection (mode_projection_bias()):
both BMP and EMP use the same harmonic_bias formula; EMP additionally
applies a per-multipole mask based on the template-to-signal ratio:
template_ratio = |harmonic_bias(1, ell)| / (cl_signal / n_templates)
project_mask = template_ratio > threshold
n_projected = project_mask.sum()
bias = np.where(project_mask, harmonic_bias(n_projected, ell), 0.0)
When coupling_matrix is not None, the scalar bias vector is
propagated through the mode-coupling matrix
\(b_\ell^{\rm pseudo} = M_{\ell\ell'} b_{\ell'}\) before subtraction.
Passing coupling_matrix=None uses the identity (full-sky approximation).
Functions:
measure_pseudo_cl(),
subtract_template_cl(),
harmonic_bias(),
mode_projection_bias()
Block bootstrap covariance
References: Berlfein et al. 2024, Sec. 6.2; Ross et al. 2011.
Module: sys_mapping.bootstrap
Physical motivation. Galaxy overdensity maps have spatially correlated structure on scales of tens to hundreds of Mpc (the same clustering signal we are trying to measure). This means that the uncertainty in any estimator computed from these maps cannot be estimated by treating pixels as independent — the effective number of degrees of freedom is far smaller than the pixel count. A spatial block bootstrap preserves the spatial covariance structure by resampling patches rather than individual pixels.
Algorithm. The survey footprint is divided into \(K\) approximately equal-area spatial patches by coarsening to a lower-resolution HEALPix grid. Each bootstrap resample draws \(K\) patches with replacement, then evaluates the estimator \(\hat\theta\) on the resampled dataset. The variance estimate is:
where \(B\) is the number of bootstrap resamples (typical: \(B = 100–500\)).
Typical settings. Use \(K \approx 10–30\) patches and \(B \approx 100–200\) resamples. Fewer patches reduce the spatial resolution of the covariance estimate; more patches increase computational cost linearly.
Technical implementation.
Patch assignment (_assign_patches()):
the footprint is partitioned by coarsening to a lower HEALPix NSIDE:
nside_patch = max(1, nside // int(sqrt(nside2npix(nside) / n_patches)))
theta, phi = hp.pix2ang(nside, pixel_indices) # unmasked pixels only
coarse_pix = hp.ang2pix(nside_patch, theta, phi)
patch_ids = np.unique(coarse_pix, return_inverse=True)[1] # 0..K-1
This ensures patches are approximately equal area (HEALPix is equal-area at each resolution) and spatially contiguous, preserving large-scale galaxy clustering modes within each patch. The coarse NSIDE is chosen so that there are \(\approx n_{\rm patches}\) unique coarse pixels.
Bootstrap loop (block_bootstrap_variance()):
for _ in range(n_bootstrap):
sampled_patches = rng.choice(unique_patches, size=K, replace=True)
pixel_mask = np.concatenate(
[np.where(patch_ids == p)[0] for p in sampled_patches])
dg_boot = delta_g_obs[pixel_mask]
dt_boot = delta_t[:, pixel_mask]
bootstrap_estimates.append(estimator(dg_boot, dt_boot))
variance = np.var(np.stack(bootstrap_estimates), axis=0, ddof=1)
Resampling K patches with replacement means that some patches are
included multiple times and others are omitted, which mimics the sampling
variance of the survey area. The ddof=1 divisor gives an unbiased
estimate of the variance of the estimator.
Performance. At n_boot = 50, n_patches = 8, NSIDE 32, the
entire bootstrap loop takes approximately 32 ms for a simple mean
estimator. Total cost scales linearly as
\(O(n_{\rm boot} \times \text{cost}(\hat\theta))\).
Function:
block_bootstrap_variance()
Jack-knife covariance
References: Ross et al. 2011 (MNRAS 417, 1350), Sec. 4.3; Ho et al. 2012 (ApJ 761, 14).
Module: sys_mapping.bootstrap
Physical motivation. The jack-knife is a deterministic leave-one-out estimator that does not require random resampling. It is more stable than the bootstrap for a fixed number of patches because it uses all \(K\) leave-one-out datasets rather than a random subset.
Algorithm. For each of the \(K\) patches, remove that patch and evaluate the estimator on the remaining \(K-1\) patches. The jack-knife covariance matrix is:
where \(\hat\theta_{(-k)}\) is the estimator with patch \(k\) removed, and \(\bar{\hat\theta} = K^{-1}\sum_k \hat\theta_{(-k)}\) is the mean over jack-knife samples. The prefactor \((K-1)/K\) ensures the estimator is unbiased for the true covariance in the limit of large \(K\).
Comparison with block bootstrap. For a well-designed spatial decomposition, the jack-knife and block bootstrap covariances should agree to within \(\sim 20\%\). The jack-knife is faster (deterministic, \(K\) estimator calls vs \(B\) for bootstrap) but more sensitive to the choice of patch geometry. The bootstrap can capture asymmetric distributions that the jack-knife misses.
Technical implementation.
The jack-knife uses the same _assign_patches helper as the bootstrap,
so patch geometry is identical. The loop is fully deterministic:
leave_one_out = []
for k in unique_patches:
keep = patch_ids != k # boolean mask
estimate_k = estimator(delta_g_obs[keep], delta_t[:, keep])
leave_one_out.append(estimate_k)
estimates = np.stack(leave_one_out) # (K, n_out)
mean_est = estimates.mean(axis=0)
residuals = estimates - mean_est
cov = (K-1)/K * (residuals.T @ residuals)
The outer product residuals.T @ residuals computes the full
(n_out, n_out) covariance matrix in one BLAS-3 call.
For scalar estimators (n_out = 1) this reduces to a sum of squared
deviations. A UserWarning is issued if n_patches < 5 (the
covariance estimate is unreliable when few patches are available).
Cost: \(K\) evaluations of the estimator (compared to
\(B\) for bootstrap, with \(K \ll B\) typically). For K = 10
patches the jack-knife is 10–20× faster than 100–200 bootstrap resamples.
Function:
jackknife_covariance()
ElasticNet regression
Reference: Weaverdyck & Huterer 2021 (MNRAS 503, 5061), Eq. 9–10.
Module: sys_mapping.regression
Physical motivation. When there are many systematic templates (\(n_s \gg 1\)) and only a few carry genuine contamination, standard OLS tends to over-fit: it assigns non-zero amplitudes to all templates, including those that are noise-dominated. Regularised regression shrinks unimportant amplitudes toward zero while retaining the dominant ones.
ElasticNet loss function. ElasticNet minimises:
where \(\boldsymbol\alpha = (a_1,\ldots,a_{n_s})^\top\) are the
additive contamination amplitudes, \(\mathbf{T}\) is the
\(n_s \times N\) template matrix, and the regularisation strengths
satisfy \(\lambda_1 = \lambda\,\rho\) and \(\lambda_2 = \lambda(1-\rho)/2\)
for a total regularisation \(\lambda\) and L1 fraction
\(\rho \in [0,1]\) (l1_ratio in the code).
The L1 term (Lasso regularisation) promotes sparsity — small amplitudes are shrunk exactly to zero, automatically selecting the dominant templates. The L2 term (Ridge regularisation) stabilises correlated templates by sharing the coefficient equally. The ElasticNet combination is preferable when templates are both numerous and correlated.
When alpha_reg=None the optimal \(\lambda\) is found by
\(k\)-fold cross-validation on the pixel grid.
Pixel weights. After fitting, systematic weights are assigned as:
with \(\epsilon = 10^{-6}\) to prevent division by zero.
Requires: scikit-learn >= 1.3 (pip install sys_mapping[regression])
Technical implementation.
elasticnet_contamination_fit() transposes
the template matrix so that scikit-learn receives the standard design matrix
convention (samples × features):
X = delta_t.T # (n_pix, n_sys) — samples × features for sklearn
y = delta_g_obs # (n_pix,)
When alpha_reg=None (auto-tune), sklearn.linear_model.ElasticNetCV
runs cv_folds-fold cross-validation on the pixel grid. The pixel grid
is split into contiguous chunks of roughly \(N/k\) pixels each
(scikit-learn’s default stratified-free split), so folds may mix spatially.
For strongly spatially correlated overdensity maps this can underestimate the
optimal \(\lambda\); in practice the effect is small relative to the
statistical noise. The CV-selected alpha_ is stored in cv_info.
model = ElasticNetCV(l1_ratio=l1_ratio, cv=cv_folds, n_jobs=-1)
model.fit(X, y)
alpha_reg = float(model.alpha_) # optimal λ from CV
cv_scores = model.mse_path_.mean(axis=-1) # mean MSE vs λ grid
The fitted coefficients are extracted as model.coef_ (shape
(n_sys,)). Per-pixel weights are then computed and clipped to
\([1/20, 20]\) to prevent numerical blow-up in pixels where
\(1 + \hat{\boldsymbol\alpha}\cdot\mathbf{t}(p) \approx 0\):
denominator = np.maximum(1.0 + alpha_hat @ delta_t, 1e-6)
weights = np.clip(1.0 / denominator, 1.0/20.0, 20.0)
The clip bound _ISD_MAX_WEIGHT = 20 is shared with the ISD pipeline;
it corresponds to a denominator floor of 0.05 — generous enough not to
bias well-behaved scenarios.
Functions:
elasticnet_contamination_fit(),
method_comparison()
Iterative Systematics Decontamination (ISD)
References: Rodríguez-Monroy et al. 2025; Rezaie et al. 2020.
Module: sys_mapping.regression
Physical motivation. Standard OLS fits a linear model \(\hat\delta_g \approx \sum_i \alpha_i t_i\). When contamination is multiplicative, the correct model is \(\hat\delta_g \approx \delta_g(1 + \sum_i b_i t_i)\), which is non-linear in the templates. ISD approximates this non-linearity by expanding the template basis with polynomial cross-terms and iterating the OLS solution.
Algorithm. Starting with the original templates \(\{t_i\}_{i=1}^{n_s}\), the expanded basis at polynomial order \(d\) includes all products \(t_i^{k_1} t_j^{k_2}\cdots\) with \(k_1 + k_2 + \cdots \leq d\). At order \(d=1\) this is just the original templates (standard OLS). At order \(d=3\) the cross terms \(t_i t_j\) (for \(i \leq j\)) and triple products \(t_i t_j t_k\) are added. For \(n_s\) templates the expanded basis has \(\binom{n_s + d}{d} - 1\) columns (e.g. 55 columns for \(n_s = 5, d = 3\)).
The iterative loop:
Form the weighted normal equations using per-pixel weights \(w^{(k)}(p)\):
\[\bigl(\mathbf{X}^\top \mathbf{W}^{(k)} \mathbf{X} + \mathbf{\Lambda}\bigr) \hat{\boldsymbol\alpha}^{(k)} = \mathbf{X}^\top \mathbf{W}^{(k)} \hat{\boldsymbol\delta}_g\]where \(\mathbf{X}\) is the \((N_{\rm pix} \times N_{\rm exp})\) expanded design matrix (constant across iterations), \(\mathbf{W}^{(k)} = \mathrm{diag}(w^{(k)})\), and \(\mathbf{\Lambda}\) is an optional ridge penalty matrix (see the ridge regularisation note below).
Update weights from the linear coefficients only (first \(n_s\) entries of \(\hat{\boldsymbol\alpha}^{(k)}\)):
\[w^{(k+1)}(p) = \frac{1}{1 + \hat{\boldsymbol\alpha}^{(k)}_{\rm lin} \cdot \mathbf{t}(p)}\]Repeat until convergence: \(\|\mathbf{w}^{(k+1)} - \mathbf{w}^{(k)}\|_2 / \|\mathbf{w}^{(k)}\|_2 < \varepsilon = 10^{-5}\).
The polynomial cross-terms improve the linear coefficient estimates but are not propagated into the weight formula, which keeps the same form as plain OLS. The cleaned field is recovered as \(\hat\delta_g^{\rm clean}(p) = w(p)(1 + \hat\delta_g^{\rm obs}(p)) - 1\).
Note
Speed (v0.2.4). Since \(\mathbf{X}\) is constant across iterations, it is precomputed once outside the loop as \(\mathbf{X}^\top\) (shape \(N_{\rm exp} \times N_{\rm pix}\)). The per-iteration cost is then one BLAS-3 matrix multiply
producing an \((N_{\rm exp} \times N_{\rm exp})\) matrix — tiny
regardless of \(N_{\rm pix}\) — followed by lstsq on this
small system. This replaces the previous approach of computing the SVD
of the full \((N_{\rm pix} \times N_{\rm exp})\) matrix at each
iteration, giving a 5–15× speed-up per iteration for realistic pixel
counts (\(N_{\rm pix} \gtrsim 10^4\)).
Note
ISD-3 accuracy: ridge regularisation on polynomial columns (v0.2.4). A key source of ISD-3 instability is a fixed-point inconsistency: the regression minimises residuals over all \(N_{\rm exp}\) columns (including polynomial cross-terms), but only the first \(n_s\) linear coefficients drive the weight update. When the polynomial columns are correlated with the linear ones — which is always the case for survey systematics — they absorb a share of the linear signal. The linear coefficients \(\hat{\boldsymbol\alpha}_{\rm lin}\) are left systematically small, and the iteration converges to a fixed point where the contamination has not actually been removed.
The fix is a ridge penalty \(\lambda_{\rm poly}\) applied exclusively to the polynomial-only columns (indices \(n_s:\) of the expanded basis):
The linear columns are never penalised, so the weight update is not biased. The penalty suppresses the polynomial terms just enough to prevent them from over-fitting noise or absorbing linear signal.
Recommended value: \(\lambda_{\rm poly} \approx 10^{-3}\,{\rm Var}(\hat\delta_g)\).
This is small enough to leave the result unaffected when the polynomial
terms are genuinely needed, and large enough to stabilise ISD-3 when
templates are strongly correlated. Exposed as the lambda_poly
argument to iterative_systematics_decontamination()
and as isd_lambda_poly in run_decontamination().
Default is 0.0 (plain OLS, backward-compatible).
Convergence. ISD-1 reliably converges within ≲ 25 iterations for all
tested configurations. ISD-3 with lambda_poly = 0 may still oscillate
when templates are strongly correlated; setting
lambda_poly = 1e-3 * np.var(delta_g_obs) resolves this in practice.
In the worst case the function returns weights at max_iter, which
remain a useful (if not fully converged) correction.
When to use.
ISD-1 (
poly_order=1): equivalent to OLS; fastest option; use as a baseline.ISD-3 (
poly_order=3): partially corrects multiplicative contamination without a full MCMC run. Setisd_lambda_poly = 1e-3 * np.var(delta_g_obs)when ISD-3 produces implausible solutions or fails to converge.MCMC-combined: preferred for high-precision analyses with known multiplicative contamination — it jointly infers additive and multiplicative amplitudes with full posterior uncertainties.
Technical implementation — polynomial expansion.
The expanded template basis is built with
itertools.combinations_with_replacement, which generates all
multi-indices of degree \(\leq d\) in lexicographic order:
rows = [delta_t] # (n_sys, n_pix) — linear terms always first
for deg in range(2, poly_order + 1):
for combo in combinations_with_replacement(range(n_sys), deg):
product = ones(n_pix)
for idx in combo:
product *= delta_t[idx]
rows.append(product[newaxis, :])
delta_t_expanded = vstack(rows) # (n_expanded, n_pix)
For n_sys = 5, poly_order = 3 this yields
\(\binom{5+3}{3} - 1 = 55\) columns. If n_pix < 10 × n_expanded
the system would be poorly conditioned; poly_order is reduced
automatically with a UserWarning.
Technical implementation — weighted normal equations.
The design matrix X = delta_t_expanded.T (shape (n_pix, n_expanded))
is computed once before the iteration loop. Each iteration updates only
the diagonal weight matrix:
Xt = delta_t_expanded # (n_expanded, n_pix), constant
X = delta_t_expanded.T # (n_pix, n_expanded), constant
ridge_diag = zeros(n_expanded)
ridge_diag[n_sys:] = lambda_poly # penalty on polynomial columns only
for iteration in range(1, max_iter + 1):
Xw = Xt * weights # (n_expanded, n_pix): broadcast scale
XtWX = Xw @ X # (n_expanded, n_expanded): BLAS-3
XtWX.flat[::n_expanded+1] += ridge_diag # in-place diagonal add
XtWy = Xw @ delta_g_obs # (n_expanded,): BLAS-1
alpha_hat_all, *_ = lstsq(XtWX, XtWy) # small system
The key insight is that Xt * weights is a row-broadcast (scale each row
of Xt by the corresponding weight), which is a single NumPy ufunc call,
not a loop. The resulting (n_expanded, n_expanded) matrix (e.g. 55×55)
is tiny and the subsequent lstsq is negligible. This replaces the
previous approach of computing the SVD of the full
(n_pix × n_expanded) matrix at each iteration, yielding a
5–15× speed-up for \(N_{\rm pix} \gtrsim 10^4\).
Weight update and clipping.
Only the first n_sys (linear) coefficients drive the weight update:
weights = 1.0 / max(1.0 + alpha_hat_all[:n_sys] @ delta_t, 1e-6)
weights = np.clip(weights, 1.0/20.0, 20.0) # hard bound
Pixels where 1 + a@t → 0 would otherwise receive weights up to
\(10^6\), making all subsequent weighted OLS fits numerically
meaningless. The clip to \([1/20, 20]\) keeps the iteration stable.
Convergence criterion.
rel_change = ||w_new - w_old|| / (||w_old|| + 1e-30)
if rel_change < tol (default 1e-5): break
ISD-1 typically converges in \(\lesssim 25\) iterations. ISD-3 with
lambda_poly = 0 may oscillate when templates are strongly correlated;
setting lambda_poly = 1e-3 × Var(δ_g) resolves this.
Two-pass outlier handling (in run_decontamination()).
After pass 1, pixels that hit the weight clip boundary are flagged as
outliers:
boundary = 20.0 * (1 - 1e-6)
outlier_mask = (weights >= boundary) | (weights <= 1.0/boundary)
If any outliers exist and there are at least \(\max(10 n_s, 50)\)
clean pixels, pass 2 refits ISD on the clean subset only, then applies the
clean coefficients to all pixels (including outliers). This two-pass scheme
prevents the handful of pathological pixels from biasing the overall fit.
The isd_outlier_mask and isd_masked_fraction fields in the result
dict report how many pixels were flagged.
Functions:
iterative_systematics_decontamination(),
run_decontamination()
Null tests and SNR ranking
References: Ross et al. 2011, Sec. 3.2 (null tests); Tanidis et al. 2026 (SNR ranking); Weaverdyck & Huterer 2021 (method comparison); Rodríguez-Monroy et al. 2025 (footprint masking).
Module: sys_mapping.diagnostics
Null test. After applying the systematic correction, the per-pixel weights \(w(p)\) should be statistically uncorrelated with all template maps. If the correction is complete, the Pearson correlation:
should satisfy \(|r_i| \approx 0\) for all \(i\). A significant residual correlation indicates that template \(i\) has not been adequately corrected. Permutation p-values are computed by shuffling \(w(p)\) and recomputing \(r_i\) a large number of times.
SNR ranking. Three estimators rank templates by their contaminating power:
"template"— \({\rm SNR}_i = |\hat\alpha_i| / \sigma_{\hat\alpha_i}\), where \(\sigma_{\hat\alpha_i}\) is the OLS standard error. This is the standard frequentist significance of each template in the linear fit."data"— \({\rm SNR}_i = |\mathrm{Corr}(\hat\delta_g, t_i)|\), the absolute Pearson correlation between the observed overdensity and template \(i\). Fast to compute; does not require fitting."peak"— the peak value of the cross pseudo-\(C_\ell\) \(\hat{C}_\ell^{dt_i}\) divided by the noise level. Identifies templates that contaminate on specific angular scales.
Footprint masking sensitivity. Cutting the survey at different mask thresholds (progressively removing pixels with extreme template values) and measuring how the fitted amplitudes \(\hat\alpha_i\) change is a powerful sanity check: genuinely contaminated fields should show stable or decreasing amplitudes as the worst-affected pixels are removed (Rodríguez-Monroy+2025).
Technical implementation.
Null test (null_test_cross_correlations()):
the Pearson correlation is computed in a numerically stable form using
centred vectors:
w_centered = weights - weights.mean()
w_norm = sqrt(sum(w_centered**2))
for i, t_i in enumerate(delta_t):
t_centered = t_i - t_i.mean()
t_norm = sqrt(sum(t_centered**2))
r[i] = sum(w_centered * t_centered) / (w_norm * t_norm)
Division by zero is guarded: if either norm is < 1e-30,
r[i] = 0.0 is returned rather than raising an exception.
Permutation p-values shuffle only the weight vector (not the templates), preserving the spatial structure of the templates:
for _ in range(n_bootstrap):
w_shuffled = rng.permutation(w_centered)
r_perm = abs(sum(w_shuffled * t_centered) / (w_norm * t_norm))
if r_perm >= |r_obs|: count_extreme += 1
p_value = (count_extreme + 1) / (n_bootstrap + 1) # Laplace smoothing
The +1 in numerator and denominator is the standard conservative
correction that ensures p_value > 0 even when no permutation exceeds
the observed statistic.
SNR ranking (snr_template_ranking()):
``”template”`` — fits OLS on the full template matrix
X = delta_t.T, estimates residual variance \(\hat\sigma^2 = \text{RSS}/(N-n_s)\), and computes the OLS t-statistic per parameter:alpha_hat, *_ = lstsq(X, delta_g_obs) sigma2 = sum(residual**2) / max(n_pix - n_sys, 1) XtX_inv = pinv(X.T @ X) param_var = sigma2 * diag(XtX_inv) snr = |alpha_hat| / sqrt(maximum(param_var, 1e-30))
Uses
np.linalg.pinvfor numerical stability when templates are nearly collinear.``”data”`` — computes
|Corr(delta_g, t_i)|in a loop; cost \(O(n_s \times N)\).``”peak”`` — calls
hp.anafasttwice per template (once for the galaxy map, once cross with each template), then normalises by the median power of the galaxy auto-spectrum as a noise-level proxy:cl_g = hp.anafast(delta_g_obs, lmax=lmax) noise_level = median(|cl_g|) + 1e-30 for t_i in delta_t: cl_cross = hp.anafast(delta_g_obs, map2=t_i, lmax=lmax) snr[i] = max(|cl_cross|) / noise_level
Footprint masking diagnostics
(footprint_mask_diagnostics()):
pixels are ranked by their RMS template value:
rms_per_pixel = sqrt(mean(delta_t**2, axis=0)) # (n_pix,): ∑ t²_i / n_s
rank_order = argsort(rms_per_pixel) # ascending
For each masking level the lowest-rms pixels are dropped first, ensuring
that the surviving pixels have progressively higher systematic leverage.
OLS is refitted on the surviving pixels at each level, and the amplitude
scatter across levels quantifies sensitivity to the masking threshold:
alpha_all[k], *_ = lstsq(delta_t[:, keep].T, delta_g_obs[keep])
scatter = std(alpha_all, axis=0) # (n_sys,): near zero → robust
Functions:
null_test_cross_correlations(),
snr_template_ranking(),
footprint_mask_diagnostics()
Mock catalog generation
References: Berlfein et al. 2024 (validation strategy); Coles & Jones 1991 (lognormal field model); Weaverdyck & Huterer 2021 (mock injection tests).
Module: sys_mapping.mocks
Physical motivation. Validating a systematic decontamination method requires knowing the true underlying galaxy field and the true contamination amplitudes. Synthetic (mock) catalogs are generated with a known ground truth, so parameter recovery and decontamination quality can be measured exactly — something impossible with real data.
Lognormal galaxy field. The true galaxy overdensity is:
where \(G(p)\) is a zero-mean Gaussian random field drawn from a power spectrum \(C_\ell^G \propto (\ell+1)^{-2}\), normalised so that the pixel-to-pixel variance equals \(\sigma^2\). This form ensures:
\(\delta_g^{\rm true}(p) > -1\) everywhere (physically: number density \(n_g \propto 1 + \delta_g > 0\)).
\(\mathbb{E}[\delta_g] = 0\) by the normalisation of the exponential.
The distribution is positively skewed (characteristic of the real galaxy distribution).
Poisson sampling. Galaxy counts are drawn as:
where \(\bar{n}\) is the mean galaxy count per pixel (set by the
n_mean parameter) and \(\hat\delta_g(p)\) is the contaminated
overdensity. A random catalog of \(N_r = 8\bar{n}N\) objects is
generated uniformly over the unmasked footprint.
Technical implementation.
Lognormal field generation (generate_lognormal_field()):
a Gaussian random field \(G(p)\) is drawn with healpy.synfast from
a power-law angular power spectrum \(C_\ell^G \propto (\ell+1)^{-2}\),
normalised so that \(\mathrm{Var}(G) = \sigma^2\):
ell = np.arange(lmax + 1)
cl_g = (ell + 1.0)**(-2)
cl_g /= cl_g.sum() # normalise to unit variance
cl_g *= sigma**2 # scale to target variance
G = hp.synfast(cl_g, nside=nside, lmax=lmax)
delta_g_true = np.exp(G - sigma**2 / 2.0) - 1.0 # ensures E[δ] = 0
The \(-\sigma^2/2\) shift is the standard log-normal mean correction that ensures \(\mathbb{E}[\delta_g] = 0\) exactly (when \(G \sim \mathcal{N}(0, \sigma^2)\) the expectation of \(e^G\) is \(e^{\sigma^2/2}\)).
Contamination injection: the true overdensity is contaminated using
apply_contamination() with the known
ground-truth amplitudes (a_true, b_true), so that parameter recovery
tests compare inferred amplitudes directly against the injected values.
Poisson sampling (make_mock_catalog()):
n_g = rng.poisson(n_mean * (1.0 + delta_g_obs)) # (n_pix,) integer counts
# Random catalog: N_r = 8 * n_mean * n_pix objects uniformly over footprint
N_r = int(8 * n_mean * good_pixels.sum())
The ratio \(N_r / N_g \approx 8\) ensures that the random catalog shot noise contributes negligibly to the Landy–Szalay estimator variance (Landy & Szalay 1993 recommend \(N_r / N_g \geq 5\)).
Functions:
generate_lognormal_field(),
make_galactic_mask(),
make_mock_catalog(),
make_mock_suite(),
MockCatalog()
HEALPix map utilities
Module: sys_mapping.maps
HEALPix (Hierarchical Equal Area isoLatitude Pixelisation) divides the sphere into \(N_{\rm pix} = 12\,{\rm NSIDE}^2\) equal-area pixels. Common NSIDE values and their properties:
NSIDE |
\(N_{\rm pix}\) |
Pixel area (deg²) |
Angular resolution (arcmin) |
|---|---|---|---|
32 |
12 288 |
3.36 |
≈ 110 |
64 |
49 152 |
0.84 |
≈ 55 |
128 |
196 608 |
0.21 |
≈ 27 |
512 |
3 145 728 |
0.013 |
≈ 7 |
For LS10 BGS analyses NSIDE = 64–128 is used.
Workflow to go from catalogs to overdensity and template arrays:
pixelize_catalog()— bin RA/Dec positions into a HEALPix count map.compute_overdensity()— form \(\hat\delta_g(p) = n_g(p)/(f_r n_r(p)) - 1\) and return the boolean unmasked-pixel array.assign_template_values()— extract template values at unmasked pixels, returning a(n_sys, n_good_pix)array.
Technical implementation.
The three-step catalog-to-overdensity pipeline executes as follows:
Step 1 — pixelise (pixelize_catalog()):
pix_ids = hp.ang2pix(nside, ra, dec, lonlat=True) # RA/Dec → HEALPix ring index
count_map = np.bincount(pix_ids, minlength=12*nside**2).astype(float)
hp.ang2pix with lonlat=True accepts RA/Dec in degrees. The result
is a (12×NSIDE²,) integer count map covering the full sky (zeros outside
the footprint). The bincount is \(O(N_{\rm gal})\).
Step 2 — overdensity (compute_overdensity()):
f_r = n_gal.sum() / n_rand.sum() # galaxy-to-random normalisation
good_pixels = (n_rand > 0) & mask # boolean, (n_pix,)
delta_g_full = np.zeros(n_pix)
delta_g_full[good_pixels] = (
n_gal[good_pixels] / (f_r * n_rand[good_pixels]) - 1.0)
The normalisation factor \(f_r = N_g / N_r\) accounts for the different total sizes of the galaxy and random catalogs. Pixels with zero random count are masked to avoid division by zero.
Step 3 — template extraction (assign_template_values()):
delta_t = template_full_sky[:, good_pixels] # (n_sys, n_good_pix)
# Centre and normalise each template to zero mean, unit variance
delta_t -= delta_t.mean(axis=1, keepdims=True)
delta_t /= delta_t.std(axis=1, keepdims=True)
The centring and normalisation ensure that template amplitudes \(a_i, b_i\) are dimensionless and comparable across templates of different physical origins.
Functions:
pixelize_catalog(),
compute_overdensity(),
assign_template_values(),
generate_systematic_map(),
generate_systematic_maps(),
systematic_power_spectrum()
Two-point function measurement
Module: sys_mapping.utils
The angular two-point correlation function \(w(\theta)\) is measured using the Landy-Szalay estimator:
where \(DD(\theta)\), \(DR(\theta)\), \(RR(\theta)\) are the normalised pair counts in angular separation bin \(\theta\) for galaxy-galaxy, galaxy-random, and random-random pairs respectively. The estimator is unbiased for large random catalogs and has minimum variance for a Poisson-distributed galaxy field (Landy & Szalay 1993, ApJ 412, 64).
sys_mapping provides wrappers for TreeCorr (which uses a ball-tree for
efficient pair counting, \(\mathcal{O}(N\log N)\)) and for Corrfunc (highly optimised C implementation for
large catalogs).
Technical implementation.
TreeCorr backend (measure_two_point_function()):
wraps treecorr.NNCorrelation with a ball-tree pair counter. The angular
bins are specified in log-space (min_sep, max_sep in degrees,
nbins bins), and the catalog is passed once for DD and once for DR:
cat_g = treecorr.Catalog(ra=ra_g, dec=dec_g, ra_units="deg", dec_units="deg")
cat_r = treecorr.Catalog(ra=ra_r, dec=dec_r, ra_units="deg", dec_units="deg")
nn = treecorr.NNCorrelation(min_sep=min_sep, max_sep=max_sep,
nbins=nbins, sep_units="deg")
nn.process(cat_g) # DD counts
nr = nn.copy(); nr.process(cat_g, cat_r) # DR counts
rr = nn.copy(); rr.process(cat_r) # RR counts
xi, varxi = nn.calculateXi(rr=rr, dr=nr) # Landy–Szalay
TreeCorr’s ball-tree algorithm has \(O(N \log N)\) pair-count complexity; it applies the Landy–Szalay formula internally.
Corrfunc backend (measure_two_point_function_corrfunc()):
calls Corrfunc.mocks.DDtheta_mocks directly on RA/Dec arrays. Corrfunc
is a highly optimised C library that vectorises the inner pair loop with AVX2
SIMD instructions and outperforms TreeCorr by 2–5× for
\(N \gtrsim 10^5\):
DD = Corrfunc.mocks.DDtheta_mocks(autocorr=True, nthreads=n_threads,
binfile=theta_bins, RA=ra_g, DEC=dec_g)
DR = Corrfunc.mocks.DDtheta_mocks(autocorr=False, nthreads=n_threads,
binfile=theta_bins, RA1=ra_g, DEC1=dec_g, RA2=ra_r, DEC2=dec_r)
RR = Corrfunc.mocks.DDtheta_mocks(autocorr=True, nthreads=n_threads,
binfile=theta_bins, RA=ra_r, DEC=dec_r)
w_theta = (DD - 2*DR + RR) / RR # Landy–Szalay (normalised pair counts)
Both backends return the same (n_bins,) array w_theta and are
interchangeable. The Corrfunc backend should be preferred for production
runs with \(N > 10^6\) galaxies.
Functions:
measure_two_point_function(),
measure_two_point_function_corrfunc(),
measure_kk_correlation_treecorr(),
measure_kk_correlation_corrfunc(),
compute_covariance_matrix(),
compute_amplitude_bias()
Method comparison table
Method |
Reference(s) |
Status |
Module |
|---|---|---|---|
Additive contamination model (MCMC) |
Berlfein+2024 |
Implemented |
|
Multiplicative contamination model (MCMC) |
Berlfein+2024 |
Implemented |
|
Combined contamination model (MCMC) |
Berlfein+2024 |
Implemented |
|
PCA template rotation |
Berlfein+2024 |
Implemented |
|
Noise debiasing |
Berlfein+2024 |
Implemented |
|
Two-point function correction |
Berlfein+2024 |
Implemented |
|
Likelihood ratio model selection |
Berlfein+2024 |
Implemented |
|
Block bootstrap covariance |
Berlfein+2024 |
Implemented |
|
Jack-knife covariance |
Ross+2011, Ho+2012 |
Implemented |
|
Null test cross-correlations |
Ross+2011 |
Implemented |
|
Pseudo-\(C_\ell\) estimator |
Elsner+2016, Ho+2012 |
Implemented |
|
Harmonic template subtraction |
Elsner+2016 |
Implemented |
|
Harmonic bias correction |
Elsner+2016 |
Implemented |
|
Basic / Extended mode projection (BMP/EMP) |
Elsner+2016, Leistedt+2014 |
Implemented |
|
Harmonic power spectrum correction pipeline |
Elsner+2016, Elsner+2017 |
Implemented |
|
ElasticNet regression |
Weaverdyck+2021 |
Implemented |
|
Iterative Systematics Decontamination (ISD) |
Rodríguez-Monroy+2025 |
Implemented |
|
Multi-method comparison framework |
Weaverdyck+2021 |
Implemented |
|
SNR-based template ranking |
Tanidis+2026 |
Implemented |
|
Footprint masking diagnostics |
Rodríguez-Monroy+2025 |
Implemented |
|
NaMaster coupling matrix (exact) |
Alonso+2019, Elsner+2017 |
Optional backend (planned) |
|
Catalogue-based PCL deprojection |
Cornish+2026 |
Deferred |
|
Transfer function calibration |
Cornish+2026 |
Deferred |
|
ANN / deep-learning weights |
Rezaie+2020 |
Not planned |
External pre-processing |
Optimal quadratic estimator (QMV) |
Ho+2012 |
Not planned |
External (NaMaster) |
Shear E/B quadratic estimator |
Tanidis+2026 |
Not planned |
Weak-lensing specific |