Methods Reference

This page maps every method implemented in sys_mapping to its source publication, provides the key equations with all notation spelled out, and gives physical motivation and practical guidance on when to use each method. For notation conventions see Overview; for bibliographic detail on each reference see Bibliography.

Core contamination model 

Reference: Berlfein et al. 2024 (MNRAS 531, 4954), Eq. 11–13.

Module: sys_mapping.contamination

Physical motivation. A photometric galaxy survey assigns each galaxy to a HEALPix pixel based on its observed position. The measured number count \(n_g(p)\) per pixel can differ from the true number count \(n_g^{\rm true}(p)\) for two physically distinct reasons:

Additive contamination: a contaminant contributes galaxy-like objects to the catalog, or masks real galaxies over a spatially coherent pattern. Examples: stellar contamination (stars misclassified as galaxies add to counts), fibre allocation inefficiency (sparse regions are under-observed). These shift \(\delta_g\) by a fixed amount proportional to \(t_i\).
Multiplicative contamination: a varying selection efficiency scales galaxy counts up or down relative to the true density. Examples: depth variations (seeing, sky background, airmass) modulate the survey completeness as a function of position. These multiply \(1+\delta_g\) by a factor proportional to \(1 + b_i t_i\).

The combined forward model is:

\[\hat{\delta}_{g}(p) = \delta_{g}(p)\!\left(1 + \sum_{i=1}^{n_s} b_i\,t_i(p)\right) + \sum_{i=1}^{n_s} a_i\,t_i(p)\]

where \(\delta_g(p) = n_g^{\rm true}(p)/\bar{n}_g - 1\) is the true overdensity, \(\hat\delta_g(p)\) is the observed overdensity, and \(t_i(p)\) is systematic template \(i\) normalised to zero mean and unit variance over the footprint. The amplitudes \(a_i\) (additive) and \(b_i\) (multiplicative) are the free parameters to be inferred.

Three nested models are obtained by constraining these amplitudes:

Model	Free parameters	Constraint	Use case
`additive`	\(a_i\), \(\sigma\)	\(b_i = 0\)	Stellar contamination, photometric bias
`multiplicative`	\(a_i\), \(\sigma\)	\(b_i = a_i\)	Depth / completeness variations
`combined`	\(a_i, b_i\), \(\sigma\)	none	Unknown contamination type; general case

Inverting the model. The clean overdensity can be recovered from the observed one via:

\[\delta_g(p) = \frac{\hat{\delta}_g(p) - \sum_i a_i\,t_i(p)} {1 + \sum_i b_i\,t_i(p)}\]

This inversion is valid only when \(1 + \sum_i b_i t_i(p) > 0\) in every pixel (i.e. the multiplicative modulation never entirely suppresses the galaxy density). For typical values \(|b_i| \lesssim 0.3\) and unit-normalised templates this condition is satisfied almost everywhere.

Weight-based correction for linear methods. For all additive-type methods (OLS, ElasticNet, ISD, MCMC-additive), the corrected field is computed as (Weaverdyck & Huterer 2020, Eq. 46-47):

\[\delta_g^{\rm corr}(p) = w(p)\,(1 + \hat{\delta}_g(p)) - 1, \qquad w(p) = \frac{1}{1 + \hat{\boldsymbol\alpha}\cdot\mathbf{t}(p)}\]

This is exact for multiplicative contamination and equals the simpler subtraction \(\hat\delta_g - \hat{\boldsymbol\alpha}\cdot\mathbf{t}\) at first order in \(\hat{\boldsymbol\alpha}\cdot\mathbf{t}\). Using the weight form preserves positivity (\(1 + \delta_g^{\rm corr} > 0\)) and is the convention in the reference literature (Weaverdyck & Huterer 2020; Rodríguez-Monroy et al. 2025); it is applied uniformly throughout this package. For MCMC-combined the full inversion formula above is used instead.

Technical implementation. All arithmetic is performed on JAX arrays (jax.Array) to enable JIT compilation and automatic differentiation. The forward and inverse models are implemented as pure functions with no mutable state:

Forward model (apply_contamination()):

mult_term = jnp.einsum("i,ij->j", b, delta_t)   # Σ b_i t_i(p)
add_term  = jnp.einsum("i,ij->j", a, delta_t)   # Σ a_i t_i(p)
delta_g_obs = delta_g * (1.0 + mult_term) + add_term

Both Einstein sums are single BLAS-1 operations of cost \(O(n_s \times N)\). At n_pix = 10\,000, n_sys = 5 the forward call runs in approximately 830 μs after JIT warm-up (float32 accumulation; max absolute roundtrip error < 1e-5).

Inverse model (invert_contamination()): adds an element-wise division (≈ 1 730 μs at the same size).

Parameter vector layout (pack_params(), unpack_params()): the flat float64 vector fed to emcee / scipy optimisers has layout

[a_0, …, a_{n_s-1},   ← always present
 b_0, …, b_{n_s-1},   ← combined model only
 sigma,                ← always
 gamma]                ← skew-normal only

Slice operations preserve values to machine precision. The match/case dispatch (Python 3.10+) has zero runtime cost.

Functions: apply_contamination() (forward model), invert_contamination() (recover clean field), pack_params() / unpack_params() (parameter vector layout), n_free_params(), compute_two_point_correction()

Gaussian and skew-normal likelihood 

Reference: Berlfein et al. 2024, Eq. 17–18.

Module: sys_mapping.likelihood

Physical motivation. To infer the contamination amplitudes \((a_i, b_i)\) from data, we need a likelihood model for the distribution of the clean overdensity field \(\delta_g(p)\). At the pixel level (with \(\bar{n} \gtrsim 30\) galaxies per pixel), Poisson fluctuations in the count map are approximately Gaussian. The central limit theorem ensures that the overdensity in each pixel approaches a Gaussian as \(\bar{n} \to \infty\).

Gaussian likelihood. Assuming \(\delta_g(p) \stackrel{\rm iid}{\sim} \mathcal{N}(0, \sigma^2)\) for each unmasked pixel \(p\), the log-likelihood is (Eq. 17):

\[\ln\mathcal{L}(\boldsymbol\Theta) = -\frac{N}{2}\ln(2\pi\sigma^2) - \underbrace{\sum_p \ln\!\left|1 + \sum_i b_i\,t_i(p)\right|}_{\text{Jacobian of the change of variables}} - \frac{1}{2\sigma^2}\sum_p \underbrace{\left[\frac{\hat\delta_g(p) - \sum_i a_i t_i(p)}{1+\sum_i b_i t_i(p)}\right]^2}_{\delta_g(p)^2}\]

Here \(\boldsymbol\Theta = (a_1,\ldots,a_{n_s},b_1,\ldots,b_{n_s},\sigma)\), \(N\) is the number of unmasked pixels, and the Jacobian term accounts for the change of variable from \(\hat\delta_g\) to \(\delta_g\). For the additive model (\(b_i=0\)) the Jacobian term vanishes.

Skew-normal likelihood. For a lognormal galaxy field, \(\delta_g = e^{G-\sigma_G^2/2}-1\) has positive skewness \(s \approx 3\sigma_G + \sigma_G^3 > 0\). Fitting a Gaussian likelihood to a positively-skewed field introduces a systematic bias in the inferred amplitudes. The skew-normal extension (Eq. 18) corrects for this by adding a log-CDF term:

\[\ln\mathcal{L}_{\rm skew} = \ln\mathcal{L}_{\rm Gauss} + N\ln 2 + \sum_p \ln\Phi\!\left(\gamma\,\frac{\delta_g(p) - \xi}{\sigma}\right)\]

where \(\Phi(z) = \tfrac{1}{2}[1+{\rm erf}(z/\sqrt{2})]\) is the standard-normal CDF, \(\gamma > 0\) is the skewness shape parameter (fitted alongside \(a_i, b_i, \sigma\)), and \(\xi = -\sigma\delta\sqrt{2/\pi}\) with \(\delta = \gamma/\sqrt{1+\gamma^2}\) is a mean-shift that ensures \(\mathbb{E}[\delta_g]=0\) is maintained. Setting \(\gamma=0\) recovers the Gaussian exactly.

Technical implementation. make_log_likelihood() is a factory: it captures n_sys, model, and use_skewed as closed-over Python integers/booleans that become compile-time constants during JAX tracing. The returned function is decorated with @jax.jit and compiled on the first call (~**227 ms** compile latency; reuse the function object across all MCMC iterations).

Gaussian path (use_skewed=False):

mult_term   = jnp.einsum("i,ij->j", b, delta_t)
add_term    = jnp.einsum("i,ij->j", a, delta_t)
log_jac     = jnp.sum(jnp.log(jnp.abs(1.0 + mult_term)))   # Jacobian
delta_g_clean = (delta_g_obs - add_term) / (1.0 + mult_term)
log_gauss   = -0.5*N*log(2πσ²) - (1/2σ²)*Σ delta_g_clean²
return log_gauss - log_jac

Execution time after JIT warm-up: ~284 μs/call at \(N = 10\,000\), \(n_s = 5\) on CPU. At zero contamination, matches scipy.stats.norm.logpdf sum to rtol = 1e-5.

Skew-normal path (use_skewed=True):

delta_param = gamma / jnp.sqrt(1.0 + gamma**2)   # δ = γ/√(1+γ²)
xi          = -sigma * delta_param * jnp.sqrt(2.0 / jnp.pi)
residual    = delta_g_clean - xi
z           = gamma * residual / sigma
log_skew    = N*log(2) + jnp.sum(log_ndtr(z))     # Σ log Φ(γ r/σ)
return log_gauss + log_skew - log_jac

The log-CDF \(\ln\Phi(z)\) is evaluated via jax.scipy.special.log_ndtr, which is numerically stable for large negative \(z\) (where log(1+erf(z/√2)) would underflow). Execution time: ~592 μs/call (extra log_ndtr sum over \(N\) pixels). Setting gamma = 0 recovers the Gaussian to atol = 1e-6.

Function: make_log_likelihood()

MCMC inference 

Reference: Berlfein et al. 2024, Sec. 4; Goodman & Weare 2010 (emcee sampler).

Module: sys_mapping.inference

Physical motivation. Given the likelihood \(\mathcal{L}(\boldsymbol\Theta)\) and a prior \(\pi(\boldsymbol\Theta)\), Bayes’ theorem gives the posterior:

\[p(\boldsymbol\Theta \mid \hat{\boldsymbol\delta}_g, \mathbf{T}) \propto \mathcal{L}(\boldsymbol\Theta)\,\pi(\boldsymbol\Theta)\]

The prior on each \(a_i\) and \(b_i\) is a zero-mean Gaussian with unit variance, \(a_i \sim \mathcal{N}(0, 1)\), reflecting the expectation that contamination amplitudes are at most of order unity. The noise parameter \(\sigma > 0\) has a half-normal prior. In the rotated template basis (see Template PCA rotation) the prior factorises across parameters, improving MCMC mixing.

sys_mapping uses the affine-invariant ensemble sampler (emcee, Foreman-Mackey et al. 2013). This sampler uses \(n_{\rm walkers}\) parallel chains (walkers) that share information via stretch and walk moves. Key properties:

Works efficiently for posteriors with correlated parameters.
Automatically adapts to the posterior shape.
The only tuning parameter is the number of walkers (recommendation: \(n_{\rm walkers} \geq 4\,n_{\rm dim}\), typical choice \(n_{\rm walkers} = 150\) for \(n_{\rm dim} = 7\)).

The MAP estimate is taken as the posterior median of the flat chain (after discarding the burn-in phase). The posterior variance is estimated from the chain covariance matrix.

Recommended settings:

Parameter	Default / recommendation
`n_walkers`	150 (>= 4 x n_dim)
`n_steps`	1500 (beyond convergence)
`n_burn`	300 (discard early transient)
Convergence check	`emcee` autocorrelation tau

Technical implementation.

Log-probability wrapper (make_log_prob()): converts the JAX log-likelihood into an emcee-compatible Python callable. delta_g_obs and delta_t are converted to jnp.asarray once at construction time to avoid repeated host–device copies. A hard sigma <= sigma_min (default 1e-6) guard returns -inf without calling JAX, short-circuiting the dispatch overhead.

Vectorised mode (vectorize=True): builds a single jax.jit(jax.vmap(log_likelihood)) kernel that accepts the full (n_walkers, n_dim) position matrix and returns (n_walkers,) log-probabilities in one GPU dispatch per step. This is the primary source of GPU speedup and requires emcee >= 3.0.

Walker initialisation in run_mcmc():

p0[:, :n_cont] = rng.normal(0.0, 1e-3, (n_walkers, n_cont))
p0[:, n_cont]  = std(delta_g_obs) * (1 + rng.normal(0.0, 0.01, n_walkers))
# gamma (skew) initialised to N(0, 0.1) when use_skewed=True

The \(10^{-3}\) scatter in contamination amplitudes places walkers near zero (the prior centre) while providing enough diversity to explore the posterior within the first ~50 steps. sigma is initialised near the observed overdensity standard deviation, which is a good prior for the noise level.

Point estimate: get_mle_params() returns np.median(flat_chain, axis=0) — a robust estimator that is insensitive to outlier walkers and converges to the true parameter at \(O(1/\sqrt{n_{\rm samples}})\).

Covariance (get_param_covariance_from_chain()): slices the flat chain by column rather than unpacking sample by sample:

a_arr = flat_chain[:, :n_sys]                   # (n_samples, n_sys)
b_arr = flat_chain[:, n_sys : 2*n_sys]          # combined only
cov_a = np.cov(a_arr, rowvar=False)              # (n_sys, n_sys)

The full covariance (not just the diagonal) is needed to correctly propagate uncertainty through the PCA back-transformation cov_orig = R.T @ cov_rot @ R.

Performance. Total wall time scales as \(O(n_{\rm walkers} \times n_{\rm steps} \times N_{\rm pix} \times n_s)\). A typical run with n_walkers = 250, n_steps = 1500, \(N_{\rm pix} \approx 50\,000\), \(n_s = 5\) takes a few minutes on CPU; the JAX likelihood is compiled once (~227 ms) then runs at ~284 μs/eval.

Functions: make_log_prob(), run_mcmc(), get_mle_params(), get_param_variance_from_chain(), get_param_covariance_from_chain()

Template PCA rotation 

Reference: Berlfein et al. 2024, Appendix A.

Module: sys_mapping.correction

Physical motivation. Systematic template maps are often correlated with each other (e.g. seeing correlates with depth because long exposures under good seeing produce deeper images). When templates are correlated, the contamination amplitudes \(a_i\) cannot be determined independently — the likelihood has ridges in parameter space, MCMC chains mix slowly, and the inferred amplitudes in the original basis are unreliable.

PCA diagonalisation. The template covariance matrix is:

\[C_{ij} = \frac{1}{N}\sum_{p=1}^N t_i(p)\,t_j(p)\]

(a symmetric, positive semi-definite \(n_s \times n_s\) matrix). Its eigendecomposition is \(C = V D V^\top\) where \(V\) is the \(n_s \times n_s\) orthogonal matrix of eigenvectors and \(D = \mathrm{diag}(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_{n_s})\) contains the eigenvalues (template variances in the rotated basis).

The rotated templates are:

\[\mathbf{T}' = V^\top \mathbf{T}, \qquad t'_j(p) = \sum_i V_{ji}\,t_i(p)\]

so that \(C' = V^\top C V = D\) is diagonal. The rotated templates are orthogonal by construction:

\[\frac{1}{N}\sum_p t'_i(p)\,t'_j(p) = \lambda_i\,\delta_{ij}\]

In the rotated basis, the likelihood factorises across parameters and MCMC convergence is faster. The rotation is applied before MCMC and the back-transformation is applied after:

\[a_i = (V\,\mathbf{a}')_i = \sum_j V_{ij}\,a'_j, \qquad b_i = (V\,\mathbf{b}')_i = \sum_j V_{ij}\,b'_j\]

The eigenvalues \(\lambda_j\) measure the variance of each rotated template and indicate how many independent systematic modes are present. Large gaps in the eigenvalue spectrum suggest that some templates are essentially redundant.

Technical implementation.

rotate_templates() performs the following steps on a (n_sys, n_pix) NumPy array:

cov = delta_t @ delta_t.T / n_pix          # (n_sys, n_sys), O(n_sys² × n_pix)
eigenvalues, eigenvectors = np.linalg.eigh(cov)   # symmetric → faster than eig
idx = np.argsort(eigenvalues)[::-1]         # descending order
rotation_matrix = eigenvectors[:, idx].T   # V^T, rows are eigenvectors
delta_t_rot = rotation_matrix @ delta_t    # (n_sys, n_pix)

numpy.linalg.eigh exploits the symmetry and positive semi-definiteness of the covariance matrix (LAPACK dsyevd driver), making it faster and more accurate than the general eig decomposition. Cost: \(O(n_s^3 + n_s \times N)\). At n_sys = 5, n_pix = 10\,000 the call takes approximately 177 μs.

Numerical guarantees:

Off-diagonal elements of the rotated covariance delta_t_rot @ delta_t_rot.T / n_pix are \(< 10^{-10}\) (orthogonality).
Rotation matrix satisfies R @ R.T = I to atol < 1e-12.
Sum of eigenvalues equals trace(cov) to rtol < 1e-10 (variance conservation).

transform_params_from_rotated() applies the inverse transformation a_orig = V @ a_rot where V = R.T:

V = rotation_matrix.T   # (n_sys, n_sys)
a_orig = V @ a_rot      # two small (n_sys, n_sys) × (n_sys,) products
b_orig = V @ b_rot

Cost: two \(n_s \times n_s\) matrix–vector multiplies (~4 μs). Roundtrip precision (rotate → infer → back-transform) is atol < 1e-12 in float64.

In run_decontamination() (MCMC methods), the covariance matrix is propagated back as:

cov_a = R.T @ cov_a_rot @ R   # (n_sys, n_sys) sandwich product

so that uncertainty in the rotated basis is correctly mapped to the original template basis without approximation.

Functions: rotate_templates(), transform_params_from_rotated()

Noise debiasing 

Reference: Berlfein et al. 2024, Eq. 21.

Module: sys_mapping.correction

Physical motivation. When the true contamination amplitude \(a_i\) is small (close to zero), the posterior median \(\hat{a}_i\) fluctuates around zero due to noise. The squared amplitude \(\hat{a}_i^2\) is therefore always positive even when the true amplitude is zero, and the expectation is:

\[\mathbb{E}[\hat{a}_i^2] = a_i^2 + \mathrm{Var}[\hat{a}_i]\]

This is the classical noise bias familiar from power spectrum estimation (where the noise power spectrum must be subtracted from the measured pseudo- \(C_\ell\)).

Debiased estimator. The noise-debiased squared amplitude is:

\[\tilde{a}_i^2 = \max\!\bigl(\hat{a}_i^2 - \mathrm{Var}[\hat{a}_i],\; 0\bigr)\]

and analogously:

\[\tilde{b}_i^2 = \max\!\bigl(\hat{b}_i^2 - \mathrm{Var}[\hat{b}_i],\; 0\bigr)\]

The \(\max(\cdot, 0)\) clip enforces the physical constraint \(a_i^2 \geq 0\) (squared amplitudes are non-negative). When the true amplitude is zero, the expected debiased estimate is zero on average.

The debiased amplitudes \(\tilde{a}_i^2, \tilde{b}_i^2\) are the inputs to the two-point correction formula (see below). Using the raw (biased) squared amplitudes would over-subtract the template auto-correlation, introducing a spurious dip in the corrected \(w(\theta)\).

Technical implementation. debias_params() is a pure NumPy function operating on (n_sys,) float64 arrays:

a_sq = np.maximum(a_hat**2 - var_a, 0.0)   # clip at 0 (squared amp ≥ 0)
b_sq = np.maximum(b_hat**2 - var_b, 0.0)

Cost: two element-wise squares, two subtractions, two np.maximum calls — approximately 4 μs regardless of \(n_s\). When the exact posterior variance is supplied, the debiased estimate recovers the true \(a_i^2\) to atol < 1e-10 (float64 round-off limited).

The var_a and var_b inputs are the diagonal of the MCMC posterior covariance returned by get_param_variance_from_chain(). In the PCA-rotated basis the off-diagonal covariance elements are small (templates are orthogonal after rotation), so the diagonal approximation is accurate.

Function: debias_params()

Two-point function correction 

Reference: Berlfein et al. 2024, Eq. 15–16.

Module: sys_mapping.correction

Derivation sketch. The angular two-point correlation function of the observed field is:

\[\hat{w}(\theta) = \bigl\langle \hat\delta_g(\hat{n})\,\hat\delta_g(\hat{n}')\bigr\rangle_\theta\]

Inserting the contamination model and expanding to first order in \(a_i^2\) and \(b_i^2\) (all cross-terms between different templates \(i \neq j\) vanish on average because templates are uncorrelated with \(\delta_g\) and with each other after normalisation):

\[\hat{w}(\theta) \approx w_g(\theta) + \sum_i b_i^2\,w_g(\theta)\,w_{t_i t_i}(\theta) + \sum_i a_i^2\,w_{t_i t_i}(\theta) = w_g(\theta)\!\left[1 + \sum_i b_i^2\,w_{t_i t_i}(\theta)\right] + \sum_i a_i^2\,w_{t_i t_i}(\theta)\]

where \(w_{t_i t_i}(\theta) = \langle t_i(\hat{n})\,t_i(\hat{n}')\rangle_\theta\) is the template auto-correlation at angular separation \(\theta\).

Solving for \(w_g(\theta)\) and replacing \(a_i^2 \to \tilde{a}_i^2\), \(b_i^2 \to \tilde{b}_i^2\) (debiased estimates):

\[\hat{w}_{\rm corr}(\theta) = \frac{\hat{w}(\theta) - \sum_i \tilde{a}_i^2\,w_{t_i t_i}(\theta)} {1 + \sum_i \tilde{b}_i^2\,w_{t_i t_i}(\theta)}\]

When to use. Apply this correction after measuring the raw \(w(\theta)\) from the galaxy catalog and the template auto-correlations \(w_{t_i t_i}(\theta)\) from the templates. The correction requires the debiased amplitudes from debias_params(). The corrected \(\hat{w}_{\rm corr}(\theta)\) is an unbiased estimate of \(w_g(\theta)\) to first order in the contamination amplitudes.

For the harmonic-space (power spectrum) equivalent, see correct_power_spectrum_harmonic().

Technical implementation.

compute_two_point_correction() operates on JAX arrays and is the inner kernel:

add_bias  = jnp.einsum("i,ij->j", a_sq, template_correlations)  # (n_bins,)
mult_bias = jnp.einsum("i,ij->j", b_sq, template_correlations)  # (n_bins,)
w_corr    = (w_obs - add_bias) / (1.0 + mult_bias)

Two Einstein sums over (n_sys, n_bins) produce (n_bins,) vectors; cost is \(O(n_s \times n_{\rm bins})\) — negligible. Execution time: approximately 576 μs at n_sys = 5, n_bins = 15 (dominated by JAX dispatch overhead). Matches the analytic formula to rtol = 1e-6 (single float32 einsum).

The convenience wrapper correct_two_point_function() chains the NumPy debiasing step with the JAX correction kernel and converts the result back to a NumPy array:

a_sq, b_sq = debias_params(a_hat, b_hat, var_a, var_b)   # NumPy, ~4 μs
w_corr = compute_two_point_correction(                    # JAX, ~576 μs
    jnp.asarray(w_obs), jnp.asarray(a_sq),
    jnp.asarray(b_sq), jnp.asarray(template_correlations))
return np.asarray(w_corr)                                 # back to NumPy

The template_correlations array (shape (n_sys, n_bins)) is the auto-correlation \(w_{t_i t_i}(\theta)\) of each template map, measured with the same angular binning as the galaxy \(w(\theta)\).

Functions: correct_two_point_function(), correct_power_spectrum_harmonic()

Likelihood ratio test (model selection)

Reference: Berlfein et al. 2024, Eq. 19; Wilks 1938 (asymptotic theorem).

Module: sys_mapping.model_selection

Physical motivation. Given data, we want to know whether the multiplicative contamination terms \(b_i\) are statistically necessary, or whether the simpler additive model with \(b_i = 0\) provides an adequate description. The LRT is the standard frequentist tool for comparing nested models.

The LRT statistic. For two nested models \(\mathcal{M}_0 \subset \mathcal{M}_1\) with MLEs \(\hat{\Theta}_0\) and \(\hat{\Theta}_1\):

\[\lambda_{\rm LR} = 2\!\left[\ln\mathcal{L}(\hat{\Theta}_1) - \ln\mathcal{L}(\hat{\Theta}_0)\right] \;\xrightarrow{N\to\infty}\; \chi^2(r), \quad r = k_1 - k_0\]

where \(k_0, k_1\) are the numbers of free parameters in each model. For the additive (\(n_s + 1\)) vs combined (\(2n_s + 1\)) comparison, \(r = n_s\).

Interpretation. A large \(\lambda_{\rm LR} \gg \chi^2_{r,0.95}\) (the 95th percentile of the \(\chi^2(r)\) distribution) means the combined model is strongly preferred — the data require multiplicative contamination. A small \(\lambda_{\rm LR} \approx 0\) means the simpler additive model is sufficient.

Note on the sign convention. The LRT statistic is always \(\lambda_{\rm LR} \geq 0\) because adding parameters cannot decrease the maximised log-likelihood. If the two likelihoods are nearly equal, \(\lambda_{\rm LR} \approx 0\) and the data cannot distinguish the models.

Technical implementation. The LRT uses the same MCMC posterior median as the point estimate for both models. The test statistic is computed by evaluating make_log_likelihood() at the two parameter vectors and differencing:

lrt = 2.0 * (log_lik_combined(theta_1) - log_lik_additive(theta_0))

Both log-likelihood calls use the JIT-compiled function objects, so the evaluation cost is two forward passes through the likelihood (~284 μs each). The p-value is obtained from the \(\chi^2(r)\) CDF via scipy.stats.chi2.sf(lrt, df=r) where r = n_sys for the additive-vs-combined comparison.

Because the MCMC posterior median — not a true MLE from gradient optimisation — is used as the parameter estimate, the LRT statistic is a conservative approximation: the posterior median is inside the posterior but may not be exactly at the likelihood maximum, so \(\lambda_{\rm LR}\) may be slightly underestimated. In practice, for well-sampled posteriors (n_steps >= 1500) the bias is negligible.

Function: likelihood_ratio_test()

Pseudo-\(C_\ell\) power spectrum 

References: Elsner, Leistedt & Peiris 2016 (MNRAS 456, 2095), Sec. 2–3; Elsner, Leistedt & Peiris 2017 (MNRAS 465, 1847); Leistedt & Peiris 2014 (MNRAS 444, 2); Ho et al. 2012 (ApJ 761, 14).

Module: sys_mapping.power_spectrum

Physical motivation. Instead of working with the real-space two-point function \(w(\theta)\), one can characterise galaxy clustering in harmonic space via the angular power spectrum \(C_\ell\). Each multipole \(\ell\) probes a characteristic angular scale \(\theta \sim \pi/\ell\) (e.g. \(\ell = 100\) corresponds to \(\approx 1.8°\)). Harmonic-space estimators allow easy comparison with theoretical predictions computed via Boltzmann codes (CAMB, CLASS).

Pseudo-:math:`C_ell` estimator. Because the galaxy survey covers only a fraction of the sky, spherical harmonic modes are mixed by the survey mask \(W(p)\). The pseudo-:math:`C_ell` estimator simply applies the mask and computes the power spectrum of the masked map:

\[\tilde{C}_\ell = \frac{1}{2\ell+1}\sum_{m=-\ell}^{\ell} \left|\sum_p W(p)\,\hat\delta_g(p)\,Y_{\ell m}^*(p)\,\Omega_p\right|^2\]

where \(Y_{\ell m}(p)\) are the spherical harmonic basis functions and \(\Omega_p = 4\pi/N_{\rm pix}\) is the pixel solid angle. In sys_mapping this is computed by passing the masked map to healpy.anafast.

The pseudo-\(C_\ell\) is related to the true power spectrum \(C_\ell\) via the MASTER equation (Mode-coupling Matrix in Coupled Harmonics for Angular Spectra Techniques, Hivon et al. 2002):

\[\langle\tilde{C}_\ell\rangle = \sum_{\ell'} M_{\ell\ell'}\,C_{\ell'}\]

where the mode-coupling matrix \(M_{\ell\ell'}\) depends only on the mask power spectrum \(\tilde{C}_\ell^{WW}\).

Template subtraction (TS) in harmonic space (Elsner+2016, Eq. 1–2). Systematic contamination adds a spurious contribution to the measured pseudo-\(C_\ell\). For an additive contaminant with amplitude \(\hat\alpha_i\), the cleaned power spectrum is:

\[\tilde{C}_\ell^{\rm TS} = \hat{C}_\ell^{dd} - \sum_i \hat\alpha_i\,\hat{C}_\ell^{t_i t_i}\]

where \(\hat{C}_\ell^{dd}\) is the measured pseudo-\(C_\ell\) of the galaxy map and \(\hat{C}_\ell^{t_i t_i}\) is the pseudo-\(C_\ell\) of template \(i\).

Harmonic bias (Elsner+2016, Eq. 8). Template subtraction introduces a residual additive bias:

\[b_\ell = -\frac{n_s}{2\ell+1}\]

This arises because subtracting a linear combination of templates removes \(n_s\) degrees of freedom from the sky, reducing the effective number of modes at each \(\ell\) by \(n_s\). At low multipoles (\(\ell \lesssim 10\)) and with many templates (\(n_s \gtrsim 10\)) this bias can be significant and must be corrected.

Mode Projection (BMP/EMP). An alternative to template subtraction is to project out the template modes from the covariance matrix before power spectrum estimation. In the Basic Mode Projection (BMP), the pixel-pixel covariance matrix is modified to marginalise over the template amplitudes. The Extended Mode Projection (EMP, Leistedt+2014) additionally discards multipoles where the template signal exceeds a threshold \(k\) times the noise, with an analytical bias correction.

Technical implementation.

Pseudo-Cℓ measurement (measure_pseudo_cl()):

masked_map = delta_g * mask_float          # element-wise multiplication
cl = hp.anafast(masked_map, lmax=lmax,
                use_pixel_weights=True)    # healpy spherical harmonic transform
ell = np.arange(len(cl))

The mask is applied as a float weight map (True → 1.0, False → 0.0). use_pixel_weights=True enables healpy’s pixel-area weighting scheme, which corrects for the geometric distortion of the HEALPix equal-area pixels at high Galactic latitude and improves accuracy at \(\ell \gtrsim N_{\rm side}\). The returned pseudo_cl is the power of the masked map and must be divided by \(W_2 = N_{\rm pix}^{-1}\sum_p W^2(p)\) (the mask power) for a first-order sky-fraction correction.

Template subtraction (subtract_template_cl()): calls hp.anafast once per template to compute \(\tilde{C}_\ell^{t_i t_i}\), then subtracts the weighted contribution:

cl_cleaned = pseudo_cl.copy()
for a_i, t_i in zip(alpha, delta_t):
    masked_template = t_i * mask_float
    cl_t = hp.anafast(masked_template, lmax=lmax, use_pixel_weights=True)
    cl_cleaned -= a_i * cl_t

Cost: \(O(n_s)\) harmonic transforms. Each transform has complexity \(O(N_{\rm pix} \log N_{\rm pix})\) (the healpy SHT).

Harmonic bias correction (harmonic_bias()): a closed-form expression requiring no iteration:

bias = -n_templates / (2.0 * ell + 1.0)   # (n_ell,)

Mode Projection (mode_projection_bias()): both BMP and EMP use the same harmonic_bias formula; EMP additionally applies a per-multipole mask based on the template-to-signal ratio:

template_ratio = |harmonic_bias(1, ell)| / (cl_signal / n_templates)
project_mask   = template_ratio > threshold
n_projected    = project_mask.sum()
bias = np.where(project_mask, harmonic_bias(n_projected, ell), 0.0)

When coupling_matrix is not None, the scalar bias vector is propagated through the mode-coupling matrix \(b_\ell^{\rm pseudo} = M_{\ell\ell'} b_{\ell'}\) before subtraction. Passing coupling_matrix=None uses the identity (full-sky approximation).

Functions: measure_pseudo_cl(), subtract_template_cl(), harmonic_bias(), mode_projection_bias()

Block bootstrap covariance 

References: Berlfein et al. 2024, Sec. 6.2; Ross et al. 2011.

Module: sys_mapping.bootstrap

Physical motivation. Galaxy overdensity maps have spatially correlated structure on scales of tens to hundreds of Mpc (the same clustering signal we are trying to measure). This means that the uncertainty in any estimator computed from these maps cannot be estimated by treating pixels as independent — the effective number of degrees of freedom is far smaller than the pixel count. A spatial block bootstrap preserves the spatial covariance structure by resampling patches rather than individual pixels.

Algorithm. The survey footprint is divided into \(K\) approximately equal-area spatial patches by coarsening to a lower-resolution HEALPix grid. Each bootstrap resample draws \(K\) patches with replacement, then evaluates the estimator \(\hat\theta\) on the resampled dataset. The variance estimate is:

\[\widehat{\rm Var}[\hat\theta] = \frac{1}{B-1}\sum_{b=1}^B \bigl(\hat\theta_b - \bar{\hat\theta}\bigr)^2, \qquad \bar{\hat\theta} = \frac{1}{B}\sum_{b=1}^B \hat\theta_b\]

where \(B\) is the number of bootstrap resamples (typical: \(B = 100–500\)).

Typical settings. Use \(K \approx 10–30\) patches and \(B \approx 100–200\) resamples. Fewer patches reduce the spatial resolution of the covariance estimate; more patches increase computational cost linearly.

Technical implementation.

Patch assignment (_assign_patches()): the footprint is partitioned by coarsening to a lower HEALPix NSIDE:

nside_patch = max(1, nside // int(sqrt(nside2npix(nside) / n_patches)))
theta, phi  = hp.pix2ang(nside, pixel_indices)   # unmasked pixels only
coarse_pix  = hp.ang2pix(nside_patch, theta, phi)
patch_ids   = np.unique(coarse_pix, return_inverse=True)[1]  # 0..K-1

This ensures patches are approximately equal area (HEALPix is equal-area at each resolution) and spatially contiguous, preserving large-scale galaxy clustering modes within each patch. The coarse NSIDE is chosen so that there are \(\approx n_{\rm patches}\) unique coarse pixels.

Bootstrap loop (block_bootstrap_variance()):

for _ in range(n_bootstrap):
    sampled_patches = rng.choice(unique_patches, size=K, replace=True)
    pixel_mask = np.concatenate(
        [np.where(patch_ids == p)[0] for p in sampled_patches])
    dg_boot = delta_g_obs[pixel_mask]
    dt_boot = delta_t[:, pixel_mask]
    bootstrap_estimates.append(estimator(dg_boot, dt_boot))
variance = np.var(np.stack(bootstrap_estimates), axis=0, ddof=1)

Resampling K patches with replacement means that some patches are included multiple times and others are omitted, which mimics the sampling variance of the survey area. The ddof=1 divisor gives an unbiased estimate of the variance of the estimator.

Performance. At n_boot = 50, n_patches = 8, NSIDE 32, the entire bootstrap loop takes approximately 32 ms for a simple mean estimator. Total cost scales linearly as \(O(n_{\rm boot} \times \text{cost}(\hat\theta))\).

Function: block_bootstrap_variance()

Jack-knife covariance 

References: Ross et al. 2011 (MNRAS 417, 1350), Sec. 4.3; Ho et al. 2012 (ApJ 761, 14).

Module: sys_mapping.bootstrap

Physical motivation. The jack-knife is a deterministic leave-one-out estimator that does not require random resampling. It is more stable than the bootstrap for a fixed number of patches because it uses all \(K\) leave-one-out datasets rather than a random subset.

Algorithm. For each of the \(K\) patches, remove that patch and evaluate the estimator on the remaining \(K-1\) patches. The jack-knife covariance matrix is:

\[\widehat{\rm Cov}[\hat\theta] = \frac{K-1}{K} \sum_{k=1}^K \bigl(\hat\theta_{(-k)} - \bar{\hat\theta}\bigr) \bigl(\hat\theta_{(-k)} - \bar{\hat\theta}\bigr)^\top\]

where \(\hat\theta_{(-k)}\) is the estimator with patch \(k\) removed, and \(\bar{\hat\theta} = K^{-1}\sum_k \hat\theta_{(-k)}\) is the mean over jack-knife samples. The prefactor \((K-1)/K\) ensures the estimator is unbiased for the true covariance in the limit of large \(K\).

Comparison with block bootstrap. For a well-designed spatial decomposition, the jack-knife and block bootstrap covariances should agree to within \(\sim 20\%\). The jack-knife is faster (deterministic, \(K\) estimator calls vs \(B\) for bootstrap) but more sensitive to the choice of patch geometry. The bootstrap can capture asymmetric distributions that the jack-knife misses.

Technical implementation. The jack-knife uses the same _assign_patches helper as the bootstrap, so patch geometry is identical. The loop is fully deterministic:

leave_one_out = []
for k in unique_patches:
    keep = patch_ids != k            # boolean mask
    estimate_k = estimator(delta_g_obs[keep], delta_t[:, keep])
    leave_one_out.append(estimate_k)
estimates = np.stack(leave_one_out)  # (K, n_out)
mean_est  = estimates.mean(axis=0)
residuals = estimates - mean_est
cov = (K-1)/K * (residuals.T @ residuals)

The outer product residuals.T @ residuals computes the full (n_out, n_out) covariance matrix in one BLAS-3 call. For scalar estimators (n_out = 1) this reduces to a sum of squared deviations. A UserWarning is issued if n_patches < 5 (the covariance estimate is unreliable when few patches are available).

Cost: \(K\) evaluations of the estimator (compared to \(B\) for bootstrap, with \(K \ll B\) typically). For K = 10 patches the jack-knife is 10–20× faster than 100–200 bootstrap resamples.

Function: jackknife_covariance()

ElasticNet regression 

Reference: Weaverdyck & Huterer 2021 (MNRAS 503, 5061), Eq. 9–10.

Module: sys_mapping.regression

Physical motivation. When there are many systematic templates (\(n_s \gg 1\)) and only a few carry genuine contamination, standard OLS tends to over-fit: it assigns non-zero amplitudes to all templates, including those that are noise-dominated. Regularised regression shrinks unimportant amplitudes toward zero while retaining the dominant ones.

ElasticNet loss function. ElasticNet minimises:

\[\mathcal{L}(\boldsymbol\alpha) = \frac{1}{2N} \left\|\hat{\boldsymbol\delta}_g - \mathbf{T}^\top\boldsymbol\alpha\right\|_2^2 + \underbrace{\lambda_1\|\boldsymbol\alpha\|_1}_{\rm Lasso\;(L1)} + \underbrace{\lambda_2\|\boldsymbol\alpha\|_2^2}_{\rm Ridge\;(L2)}\]

where \(\boldsymbol\alpha = (a_1,\ldots,a_{n_s})^\top\) are the additive contamination amplitudes, \(\mathbf{T}\) is the \(n_s \times N\) template matrix, and the regularisation strengths satisfy \(\lambda_1 = \lambda\,\rho\) and \(\lambda_2 = \lambda(1-\rho)/2\) for a total regularisation \(\lambda\) and L1 fraction \(\rho \in [0,1]\) (l1_ratio in the code).

The L1 term (Lasso regularisation) promotes sparsity — small amplitudes are shrunk exactly to zero, automatically selecting the dominant templates. The L2 term (Ridge regularisation) stabilises correlated templates by sharing the coefficient equally. The ElasticNet combination is preferable when templates are both numerous and correlated.

When alpha_reg=None the optimal \(\lambda\) is found by \(k\)-fold cross-validation on the pixel grid.

Pixel weights. After fitting, systematic weights are assigned as:

\[w(p) = \frac{1}{\max\!\left(1 + \hat{\boldsymbol\alpha}\cdot\mathbf{t}(p),\;\epsilon\right)}\]

with \(\epsilon = 10^{-6}\) to prevent division by zero.

Requires: scikit-learn >= 1.3 (pip install sys_mapping[regression])

Technical implementation.

elasticnet_contamination_fit() transposes the template matrix so that scikit-learn receives the standard design matrix convention (samples × features):

X = delta_t.T           # (n_pix, n_sys) — samples × features for sklearn
y = delta_g_obs         # (n_pix,)

When alpha_reg=None (auto-tune), sklearn.linear_model.ElasticNetCV runs cv_folds-fold cross-validation on the pixel grid. The pixel grid is split into contiguous chunks of roughly \(N/k\) pixels each (scikit-learn’s default stratified-free split), so folds may mix spatially. For strongly spatially correlated overdensity maps this can underestimate the optimal \(\lambda\); in practice the effect is small relative to the statistical noise. The CV-selected alpha_ is stored in cv_info.

model = ElasticNetCV(l1_ratio=l1_ratio, cv=cv_folds, n_jobs=-1)
model.fit(X, y)
alpha_reg = float(model.alpha_)           # optimal λ from CV
cv_scores = model.mse_path_.mean(axis=-1) # mean MSE vs λ grid

The fitted coefficients are extracted as model.coef_ (shape (n_sys,)). Per-pixel weights are then computed and clipped to \([1/20, 20]\) to prevent numerical blow-up in pixels where \(1 + \hat{\boldsymbol\alpha}\cdot\mathbf{t}(p) \approx 0\):

denominator = np.maximum(1.0 + alpha_hat @ delta_t, 1e-6)
weights = np.clip(1.0 / denominator, 1.0/20.0, 20.0)

The clip bound _ISD_MAX_WEIGHT = 20 is shared with the ISD pipeline; it corresponds to a denominator floor of 0.05 — generous enough not to bias well-behaved scenarios.

Functions: elasticnet_contamination_fit(), method_comparison()

Iterative Systematics Decontamination (ISD)

References: Rodríguez-Monroy et al. 2025; Rezaie et al. 2020.

Module: sys_mapping.regression

Physical motivation. Standard OLS fits a linear model \(\hat\delta_g \approx \sum_i \alpha_i t_i\). When contamination is multiplicative, the correct model is \(\hat\delta_g \approx \delta_g(1 + \sum_i b_i t_i)\), which is non-linear in the templates. ISD approximates this non-linearity by expanding the template basis with polynomial cross-terms and iterating the OLS solution.

Algorithm. Starting with the original templates \(\{t_i\}_{i=1}^{n_s}\), the expanded basis at polynomial order \(d\) includes all products \(t_i^{k_1} t_j^{k_2}\cdots\) with \(k_1 + k_2 + \cdots \leq d\). At order \(d=1\) this is just the original templates (standard OLS). At order \(d=3\) the cross terms \(t_i t_j\) (for \(i \leq j\)) and triple products \(t_i t_j t_k\) are added. For \(n_s\) templates the expanded basis has \(\binom{n_s + d}{d} - 1\) columns (e.g. 55 columns for \(n_s = 5, d = 3\)).

The iterative loop:

Form the weighted normal equations using per-pixel weights \(w^{(k)}(p)\):

\[\bigl(\mathbf{X}^\top \mathbf{W}^{(k)} \mathbf{X} + \mathbf{\Lambda}\bigr) \hat{\boldsymbol\alpha}^{(k)} = \mathbf{X}^\top \mathbf{W}^{(k)} \hat{\boldsymbol\delta}_g\]

where \(\mathbf{X}\) is the \((N_{\rm pix} \times N_{\rm exp})\) expanded design matrix (constant across iterations), \(\mathbf{W}^{(k)} = \mathrm{diag}(w^{(k)})\), and \(\mathbf{\Lambda}\) is an optional ridge penalty matrix (see the ridge regularisation note below).
Update weights from the linear coefficients only (first \(n_s\) entries of \(\hat{\boldsymbol\alpha}^{(k)}\)):

\[w^{(k+1)}(p) = \frac{1}{1 + \hat{\boldsymbol\alpha}^{(k)}_{\rm lin} \cdot \mathbf{t}(p)}\]
Repeat until convergence: \(\|\mathbf{w}^{(k+1)} - \mathbf{w}^{(k)}\|_2 / \|\mathbf{w}^{(k)}\|_2 < \varepsilon = 10^{-5}\).

The polynomial cross-terms improve the linear coefficient estimates but are not propagated into the weight formula, which keeps the same form as plain OLS. The cleaned field is recovered as \(\hat\delta_g^{\rm clean}(p) = w(p)(1 + \hat\delta_g^{\rm obs}(p)) - 1\).

Note

Speed (v0.2.4). Since \(\mathbf{X}\) is constant across iterations, it is precomputed once outside the loop as \(\mathbf{X}^\top\) (shape \(N_{\rm exp} \times N_{\rm pix}\)). The per-iteration cost is then one BLAS-3 matrix multiply

\[\mathbf{X}^\top \mathbf{W}^{(k)} \mathbf{X} = \underbrace{(\mathbf{X}^\top \odot \mathbf{w}^{(k)})}_{\text{broadcast scale}} \mathbf{X}\]

producing an \((N_{\rm exp} \times N_{\rm exp})\) matrix — tiny regardless of \(N_{\rm pix}\) — followed by lstsq on this small system. This replaces the previous approach of computing the SVD of the full \((N_{\rm pix} \times N_{\rm exp})\) matrix at each iteration, giving a 5–15× speed-up per iteration for realistic pixel counts (\(N_{\rm pix} \gtrsim 10^4\)).

Note

ISD-3 accuracy: ridge regularisation on polynomial columns (v0.2.4). A key source of ISD-3 instability is a fixed-point inconsistency: the regression minimises residuals over all \(N_{\rm exp}\) columns (including polynomial cross-terms), but only the first \(n_s\) linear coefficients drive the weight update. When the polynomial columns are correlated with the linear ones — which is always the case for survey systematics — they absorb a share of the linear signal. The linear coefficients \(\hat{\boldsymbol\alpha}_{\rm lin}\) are left systematically small, and the iteration converges to a fixed point where the contamination has not actually been removed.

The fix is a ridge penalty \(\lambda_{\rm poly}\) applied exclusively to the polynomial-only columns (indices \(n_s:\) of the expanded basis):

\[\mathbf{\Lambda} = \mathrm{diag}\!\underbrace{(0,\ldots,0}_{n_s}, \underbrace{\lambda_{\rm poly},\ldots,\lambda_{\rm poly}}_{N_{\rm exp}-n_s})\]

The linear columns are never penalised, so the weight update is not biased. The penalty suppresses the polynomial terms just enough to prevent them from over-fitting noise or absorbing linear signal.

Recommended value: \(\lambda_{\rm poly} \approx 10^{-3}\,{\rm Var}(\hat\delta_g)\). This is small enough to leave the result unaffected when the polynomial terms are genuinely needed, and large enough to stabilise ISD-3 when templates are strongly correlated. Exposed as the lambda_poly argument to iterative_systematics_decontamination() and as isd_lambda_poly in run_decontamination(). Default is 0.0 (plain OLS, backward-compatible).

Convergence. ISD-1 reliably converges within ≲ 25 iterations for all tested configurations. ISD-3 with lambda_poly = 0 may still oscillate when templates are strongly correlated; setting lambda_poly = 1e-3 * np.var(delta_g_obs) resolves this in practice. In the worst case the function returns weights at max_iter, which remain a useful (if not fully converged) correction.

When to use.

ISD-1 (poly_order=1): equivalent to OLS; fastest option; use as a baseline.
ISD-3 (poly_order=3): partially corrects multiplicative contamination without a full MCMC run. Set isd_lambda_poly = 1e-3 * np.var(delta_g_obs) when ISD-3 produces implausible solutions or fails to converge.
MCMC-combined: preferred for high-precision analyses with known multiplicative contamination — it jointly infers additive and multiplicative amplitudes with full posterior uncertainties.

Technical implementation — polynomial expansion. The expanded template basis is built with itertools.combinations_with_replacement, which generates all multi-indices of degree \(\leq d\) in lexicographic order:

rows = [delta_t]   # (n_sys, n_pix) — linear terms always first
for deg in range(2, poly_order + 1):
    for combo in combinations_with_replacement(range(n_sys), deg):
        product = ones(n_pix)
        for idx in combo:
            product *= delta_t[idx]
        rows.append(product[newaxis, :])
delta_t_expanded = vstack(rows)  # (n_expanded, n_pix)

For n_sys = 5, poly_order = 3 this yields \(\binom{5+3}{3} - 1 = 55\) columns. If n_pix < 10 × n_expanded the system would be poorly conditioned; poly_order is reduced automatically with a UserWarning.

Technical implementation — weighted normal equations. The design matrix X = delta_t_expanded.T (shape (n_pix, n_expanded)) is computed once before the iteration loop. Each iteration updates only the diagonal weight matrix:

Xt = delta_t_expanded            # (n_expanded, n_pix), constant
X  = delta_t_expanded.T          # (n_pix, n_expanded), constant
ridge_diag = zeros(n_expanded)
ridge_diag[n_sys:] = lambda_poly  # penalty on polynomial columns only
for iteration in range(1, max_iter + 1):
    Xw   = Xt * weights           # (n_expanded, n_pix): broadcast scale
    XtWX = Xw @ X                 # (n_expanded, n_expanded): BLAS-3
    XtWX.flat[::n_expanded+1] += ridge_diag   # in-place diagonal add
    XtWy = Xw @ delta_g_obs       # (n_expanded,): BLAS-1
    alpha_hat_all, *_ = lstsq(XtWX, XtWy)    # small system

The key insight is that Xt * weights is a row-broadcast (scale each row of Xt by the corresponding weight), which is a single NumPy ufunc call, not a loop. The resulting (n_expanded, n_expanded) matrix (e.g. 55×55) is tiny and the subsequent lstsq is negligible. This replaces the previous approach of computing the SVD of the full (n_pix × n_expanded) matrix at each iteration, yielding a 5–15× speed-up for \(N_{\rm pix} \gtrsim 10^4\).

Weight update and clipping. Only the first n_sys (linear) coefficients drive the weight update:

weights = 1.0 / max(1.0 + alpha_hat_all[:n_sys] @ delta_t, 1e-6)
weights = np.clip(weights, 1.0/20.0, 20.0)   # hard bound

Pixels where 1 + a@t → 0 would otherwise receive weights up to \(10^6\), making all subsequent weighted OLS fits numerically meaningless. The clip to \([1/20, 20]\) keeps the iteration stable.

Convergence criterion.

rel_change = ||w_new - w_old|| / (||w_old|| + 1e-30)
if rel_change < tol (default 1e-5): break

ISD-1 typically converges in \(\lesssim 25\) iterations. ISD-3 with lambda_poly = 0 may oscillate when templates are strongly correlated; setting lambda_poly = 1e-3 × Var(δ_g) resolves this.

Two-pass outlier handling (in run_decontamination()). After pass 1, pixels that hit the weight clip boundary are flagged as outliers:

boundary     = 20.0 * (1 - 1e-6)
outlier_mask = (weights >= boundary) | (weights <= 1.0/boundary)

If any outliers exist and there are at least \(\max(10 n_s, 50)\) clean pixels, pass 2 refits ISD on the clean subset only, then applies the clean coefficients to all pixels (including outliers). This two-pass scheme prevents the handful of pathological pixels from biasing the overall fit. The isd_outlier_mask and isd_masked_fraction fields in the result dict report how many pixels were flagged.

Functions: iterative_systematics_decontamination(), run_decontamination()

Null tests and SNR ranking 

References: Ross et al. 2011, Sec. 3.2 (null tests); Tanidis et al. 2026 (SNR ranking); Weaverdyck & Huterer 2021 (method comparison); Rodríguez-Monroy et al. 2025 (footprint masking).

Module: sys_mapping.diagnostics

Null test. After applying the systematic correction, the per-pixel weights \(w(p)\) should be statistically uncorrelated with all template maps. If the correction is complete, the Pearson correlation:

\[r_i = \frac{\sum_p \bigl(w(p) - \bar{w}\bigr)\bigl(t_i(p) - \bar{t}_i\bigr)} {\sqrt{\sum_p(w(p)-\bar{w})^2}\,\sqrt{\sum_p(t_i(p)-\bar{t}_i)^2}}\]

should satisfy \(|r_i| \approx 0\) for all \(i\). A significant residual correlation indicates that template \(i\) has not been adequately corrected. Permutation p-values are computed by shuffling \(w(p)\) and recomputing \(r_i\) a large number of times.

SNR ranking. Three estimators rank templates by their contaminating power:

"template" — \({\rm SNR}_i = |\hat\alpha_i| / \sigma_{\hat\alpha_i}\), where \(\sigma_{\hat\alpha_i}\) is the OLS standard error. This is the standard frequentist significance of each template in the linear fit.
"data" — \({\rm SNR}_i = |\mathrm{Corr}(\hat\delta_g, t_i)|\), the absolute Pearson correlation between the observed overdensity and template \(i\). Fast to compute; does not require fitting.
"peak" — the peak value of the cross pseudo-\(C_\ell\) \(\hat{C}_\ell^{dt_i}\) divided by the noise level. Identifies templates that contaminate on specific angular scales.

Footprint masking sensitivity. Cutting the survey at different mask thresholds (progressively removing pixels with extreme template values) and measuring how the fitted amplitudes \(\hat\alpha_i\) change is a powerful sanity check: genuinely contaminated fields should show stable or decreasing amplitudes as the worst-affected pixels are removed (Rodríguez-Monroy+2025).

Technical implementation.

Null test (null_test_cross_correlations()): the Pearson correlation is computed in a numerically stable form using centred vectors:

w_centered = weights - weights.mean()
w_norm     = sqrt(sum(w_centered**2))
for i, t_i in enumerate(delta_t):
    t_centered = t_i - t_i.mean()
    t_norm     = sqrt(sum(t_centered**2))
    r[i] = sum(w_centered * t_centered) / (w_norm * t_norm)

Division by zero is guarded: if either norm is < 1e-30, r[i] = 0.0 is returned rather than raising an exception.

Permutation p-values shuffle only the weight vector (not the templates), preserving the spatial structure of the templates:

for _ in range(n_bootstrap):
    w_shuffled = rng.permutation(w_centered)
    r_perm = abs(sum(w_shuffled * t_centered) / (w_norm * t_norm))
    if r_perm >= |r_obs|: count_extreme += 1
p_value = (count_extreme + 1) / (n_bootstrap + 1)   # Laplace smoothing

The +1 in numerator and denominator is the standard conservative correction that ensures p_value > 0 even when no permutation exceeds the observed statistic.

SNR ranking (snr_template_ranking()):

``”template”`` — fits OLS on the full template matrix X = delta_t.T, estimates residual variance \(\hat\sigma^2 = \text{RSS}/(N-n_s)\), and computes the OLS t-statistic per parameter:

alpha_hat, *_ = lstsq(X, delta_g_obs)
sigma2        = sum(residual**2) / max(n_pix - n_sys, 1)
XtX_inv       = pinv(X.T @ X)
param_var     = sigma2 * diag(XtX_inv)
snr           = |alpha_hat| / sqrt(maximum(param_var, 1e-30))

Uses np.linalg.pinv for numerical stability when templates are nearly collinear.

``”data”`` — computes |Corr(delta_g, t_i)| in a loop; cost \(O(n_s \times N)\).

``”peak”`` — calls hp.anafast twice per template (once for the galaxy map, once cross with each template), then normalises by the median power of the galaxy auto-spectrum as a noise-level proxy:

cl_g        = hp.anafast(delta_g_obs, lmax=lmax)
noise_level = median(|cl_g|) + 1e-30
for t_i in delta_t:
    cl_cross = hp.anafast(delta_g_obs, map2=t_i, lmax=lmax)
    snr[i]   = max(|cl_cross|) / noise_level

Footprint masking diagnostics (footprint_mask_diagnostics()): pixels are ranked by their RMS template value:

rms_per_pixel = sqrt(mean(delta_t**2, axis=0))  # (n_pix,): ∑ t²_i / n_s
rank_order    = argsort(rms_per_pixel)           # ascending

For each masking level the lowest-rms pixels are dropped first, ensuring that the surviving pixels have progressively higher systematic leverage. OLS is refitted on the surviving pixels at each level, and the amplitude scatter across levels quantifies sensitivity to the masking threshold:

alpha_all[k], *_ = lstsq(delta_t[:, keep].T, delta_g_obs[keep])
scatter = std(alpha_all, axis=0)   # (n_sys,): near zero → robust

Functions: null_test_cross_correlations(), snr_template_ranking(), footprint_mask_diagnostics()

Mock catalog generation 

References: Berlfein et al. 2024 (validation strategy); Coles & Jones 1991 (lognormal field model); Weaverdyck & Huterer 2021 (mock injection tests).

Module: sys_mapping.mocks

Physical motivation. Validating a systematic decontamination method requires knowing the true underlying galaxy field and the true contamination amplitudes. Synthetic (mock) catalogs are generated with a known ground truth, so parameter recovery and decontamination quality can be measured exactly — something impossible with real data.

Lognormal galaxy field. The true galaxy overdensity is:

\[\delta_g^{\rm true}(p) = e^{G(p) - \sigma^2/2} - 1\]

where \(G(p)\) is a zero-mean Gaussian random field drawn from a power spectrum \(C_\ell^G \propto (\ell+1)^{-2}\), normalised so that the pixel-to-pixel variance equals \(\sigma^2\). This form ensures:

\(\delta_g^{\rm true}(p) > -1\) everywhere (physically: number density \(n_g \propto 1 + \delta_g > 0\)).
\(\mathbb{E}[\delta_g] = 0\) by the normalisation of the exponential.
The distribution is positively skewed (characteristic of the real galaxy distribution).

Poisson sampling. Galaxy counts are drawn as:

\[n_g(p) \sim {\rm Poisson}\!\bigl[\bar{n}\,(1 + \hat\delta_g(p))\bigr]\]

where \(\bar{n}\) is the mean galaxy count per pixel (set by the n_mean parameter) and \(\hat\delta_g(p)\) is the contaminated overdensity. A random catalog of \(N_r = 8\bar{n}N\) objects is generated uniformly over the unmasked footprint.

Technical implementation.

Lognormal field generation (generate_lognormal_field()): a Gaussian random field \(G(p)\) is drawn with healpy.synfast from a power-law angular power spectrum \(C_\ell^G \propto (\ell+1)^{-2}\), normalised so that \(\mathrm{Var}(G) = \sigma^2\):

ell  = np.arange(lmax + 1)
cl_g = (ell + 1.0)**(-2)
cl_g /= cl_g.sum()       # normalise to unit variance
cl_g *= sigma**2          # scale to target variance
G    = hp.synfast(cl_g, nside=nside, lmax=lmax)
delta_g_true = np.exp(G - sigma**2 / 2.0) - 1.0   # ensures E[δ] = 0

The \(-\sigma^2/2\) shift is the standard log-normal mean correction that ensures \(\mathbb{E}[\delta_g] = 0\) exactly (when \(G \sim \mathcal{N}(0, \sigma^2)\) the expectation of \(e^G\) is \(e^{\sigma^2/2}\)).

Contamination injection: the true overdensity is contaminated using apply_contamination() with the known ground-truth amplitudes (a_true, b_true), so that parameter recovery tests compare inferred amplitudes directly against the injected values.

Poisson sampling (make_mock_catalog()):

n_g = rng.poisson(n_mean * (1.0 + delta_g_obs))   # (n_pix,) integer counts
# Random catalog: N_r = 8 * n_mean * n_pix objects uniformly over footprint
N_r = int(8 * n_mean * good_pixels.sum())

The ratio \(N_r / N_g \approx 8\) ensures that the random catalog shot noise contributes negligibly to the Landy–Szalay estimator variance (Landy & Szalay 1993 recommend \(N_r / N_g \geq 5\)).

Functions: generate_lognormal_field(), make_galactic_mask(), make_mock_catalog(), make_mock_suite(), MockCatalog()

HEALPix map utilities 

Module: sys_mapping.maps

HEALPix (Hierarchical Equal Area isoLatitude Pixelisation) divides the sphere into \(N_{\rm pix} = 12\,{\rm NSIDE}^2\) equal-area pixels. Common NSIDE values and their properties:

NSIDE	\(N_{\rm pix}\)	Pixel area (deg²)	Angular resolution (arcmin)
32	12 288	3.36	≈ 110
64	49 152	0.84	≈ 55
128	196 608	0.21	≈ 27
512	3 145 728	0.013	≈ 7

For LS10 BGS analyses NSIDE = 64–128 is used.

Workflow to go from catalogs to overdensity and template arrays:

pixelize_catalog() — bin RA/Dec positions into a HEALPix count map.
compute_overdensity() — form \(\hat\delta_g(p) = n_g(p)/(f_r n_r(p)) - 1\) and return the boolean unmasked-pixel array.
assign_template_values() — extract template values at unmasked pixels, returning a (n_sys, n_good_pix) array.

Technical implementation.

The three-step catalog-to-overdensity pipeline executes as follows:

Step 1 — pixelise (pixelize_catalog()):

pix_ids = hp.ang2pix(nside, ra, dec, lonlat=True)   # RA/Dec → HEALPix ring index
count_map = np.bincount(pix_ids, minlength=12*nside**2).astype(float)

hp.ang2pix with lonlat=True accepts RA/Dec in degrees. The result is a (12×NSIDE²,) integer count map covering the full sky (zeros outside the footprint). The bincount is \(O(N_{\rm gal})\).

Step 2 — overdensity (compute_overdensity()):

f_r          = n_gal.sum() / n_rand.sum()    # galaxy-to-random normalisation
good_pixels  = (n_rand > 0) & mask           # boolean, (n_pix,)
delta_g_full = np.zeros(n_pix)
delta_g_full[good_pixels] = (
    n_gal[good_pixels] / (f_r * n_rand[good_pixels]) - 1.0)

The normalisation factor \(f_r = N_g / N_r\) accounts for the different total sizes of the galaxy and random catalogs. Pixels with zero random count are masked to avoid division by zero.

Step 3 — template extraction (assign_template_values()):

delta_t = template_full_sky[:, good_pixels]   # (n_sys, n_good_pix)
# Centre and normalise each template to zero mean, unit variance
delta_t -= delta_t.mean(axis=1, keepdims=True)
delta_t /= delta_t.std(axis=1, keepdims=True)

The centring and normalisation ensure that template amplitudes \(a_i, b_i\) are dimensionless and comparable across templates of different physical origins.

Functions: pixelize_catalog(), compute_overdensity(), assign_template_values(), generate_systematic_map(), generate_systematic_maps(), systematic_power_spectrum()

Two-point function measurement 

Module: sys_mapping.utils

The angular two-point correlation function \(w(\theta)\) is measured using the Landy-Szalay estimator:

\[w(\theta) = \frac{DD(\theta) - 2\,DR(\theta) + RR(\theta)}{RR(\theta)}\]

where \(DD(\theta)\), \(DR(\theta)\), \(RR(\theta)\) are the normalised pair counts in angular separation bin \(\theta\) for galaxy-galaxy, galaxy-random, and random-random pairs respectively. The estimator is unbiased for large random catalogs and has minimum variance for a Poisson-distributed galaxy field (Landy & Szalay 1993, ApJ 412, 64).

sys_mapping provides wrappers for TreeCorr (which uses a ball-tree for efficient pair counting, \(\mathcal{O}(N\log N)\)) and for Corrfunc (highly optimised C implementation for large catalogs).

Technical implementation.

TreeCorr backend (measure_two_point_function()): wraps treecorr.NNCorrelation with a ball-tree pair counter. The angular bins are specified in log-space (min_sep, max_sep in degrees, nbins bins), and the catalog is passed once for DD and once for DR:

cat_g = treecorr.Catalog(ra=ra_g, dec=dec_g, ra_units="deg", dec_units="deg")
cat_r = treecorr.Catalog(ra=ra_r, dec=dec_r, ra_units="deg", dec_units="deg")
nn = treecorr.NNCorrelation(min_sep=min_sep, max_sep=max_sep,
                             nbins=nbins, sep_units="deg")
nn.process(cat_g)         # DD counts
nr = nn.copy(); nr.process(cat_g, cat_r)   # DR counts
rr = nn.copy(); rr.process(cat_r)          # RR counts
xi, varxi = nn.calculateXi(rr=rr, dr=nr)  # Landy–Szalay

TreeCorr’s ball-tree algorithm has \(O(N \log N)\) pair-count complexity; it applies the Landy–Szalay formula internally.

Corrfunc backend (measure_two_point_function_corrfunc()): calls Corrfunc.mocks.DDtheta_mocks directly on RA/Dec arrays. Corrfunc is a highly optimised C library that vectorises the inner pair loop with AVX2 SIMD instructions and outperforms TreeCorr by 2–5× for \(N \gtrsim 10^5\):

DD = Corrfunc.mocks.DDtheta_mocks(autocorr=True, nthreads=n_threads,
        binfile=theta_bins, RA=ra_g, DEC=dec_g)
DR = Corrfunc.mocks.DDtheta_mocks(autocorr=False, nthreads=n_threads,
        binfile=theta_bins, RA1=ra_g, DEC1=dec_g, RA2=ra_r, DEC2=dec_r)
RR = Corrfunc.mocks.DDtheta_mocks(autocorr=True, nthreads=n_threads,
        binfile=theta_bins, RA=ra_r, DEC=dec_r)
w_theta = (DD - 2*DR + RR) / RR   # Landy–Szalay (normalised pair counts)

Both backends return the same (n_bins,) array w_theta and are interchangeable. The Corrfunc backend should be preferred for production runs with \(N > 10^6\) galaxies.

Functions: measure_two_point_function(), measure_two_point_function_corrfunc(), measure_kk_correlation_treecorr(), measure_kk_correlation_corrfunc(), compute_covariance_matrix(), compute_amplitude_bias()

Method comparison table 

Method	Reference(s)	Status	Module
Additive contamination model (MCMC)	Berlfein+2024	Implemented	`likelihood`, `inference`
Multiplicative contamination model (MCMC)	Berlfein+2024	Implemented	`likelihood`, `inference`
Combined contamination model (MCMC)	Berlfein+2024	Implemented	`likelihood`, `inference`
PCA template rotation	Berlfein+2024	Implemented	`correction`
Noise debiasing	Berlfein+2024	Implemented	`correction`
Two-point function correction	Berlfein+2024	Implemented	`contamination`, `correction`
Likelihood ratio model selection	Berlfein+2024	Implemented	`model_selection`
Block bootstrap covariance	Berlfein+2024	Implemented	`bootstrap`
Jack-knife covariance	Ross+2011, Ho+2012	Implemented	`bootstrap`
Null test cross-correlations	Ross+2011	Implemented	`diagnostics`
Pseudo-\(C_\ell\) estimator	Elsner+2016, Ho+2012	Implemented	`power_spectrum`
Harmonic template subtraction	Elsner+2016	Implemented	`power_spectrum`
Harmonic bias correction	Elsner+2016	Implemented	`power_spectrum`
Basic / Extended mode projection (BMP/EMP)	Elsner+2016, Leistedt+2014	Implemented	`power_spectrum`
Harmonic power spectrum correction pipeline	Elsner+2016, Elsner+2017	Implemented	`correction`
ElasticNet regression	Weaverdyck+2021	Implemented	`regression`
Iterative Systematics Decontamination (ISD)	Rodríguez-Monroy+2025	Implemented	`regression`
Multi-method comparison framework	Weaverdyck+2021	Implemented	`regression`
SNR-based template ranking	Tanidis+2026	Implemented	`diagnostics`
Footprint masking diagnostics	Rodríguez-Monroy+2025	Implemented	`diagnostics`
NaMaster coupling matrix (exact)	Alonso+2019, Elsner+2017	Optional backend (planned)	`power_spectrum`
Catalogue-based PCL deprojection	Cornish+2026	Deferred	`deprojection` (future)
Transfer function calibration	Cornish+2026	Deferred	`deprojection` (future)
ANN / deep-learning weights	Rezaie+2020	Not planned	External pre-processing
Optimal quadratic estimator (QMV)	Ho+2012	Not planned	External (NaMaster)
Shear E/B quadratic estimator	Tanidis+2026	Not planned	Weak-lensing specific