Methods Reference ================= This page maps every method implemented in ``sys_mapping`` to its source publication, provides the key equations with all notation spelled out, and gives physical motivation and practical guidance on when to use each method. For notation conventions see :doc:`overview`; for bibliographic detail on each reference see :doc:`bibliography`. .. contents:: :local: :depth: 2 ---- Core contamination model ------------------------- **Reference:** `Berlfein et al. 2024 `_ (MNRAS 531, 4954), Eq. 11–13. **Module:** :mod:`sys_mapping.contamination` **Physical motivation.** A photometric galaxy survey assigns each galaxy to a HEALPix pixel based on its observed position. The measured number count :math:`n_g(p)` per pixel can differ from the true number count :math:`n_g^{\rm true}(p)` for two physically distinct reasons: * *Additive* contamination: a contaminant contributes galaxy-like objects to the catalog, or masks real galaxies over a spatially coherent pattern. Examples: stellar contamination (stars misclassified as galaxies add to counts), fibre allocation inefficiency (sparse regions are under-observed). These shift :math:`\delta_g` by a fixed amount proportional to :math:`t_i`. * *Multiplicative* contamination: a varying selection efficiency scales galaxy counts up or down relative to the true density. Examples: depth variations (seeing, sky background, airmass) modulate the survey completeness as a function of position. These multiply :math:`1+\delta_g` by a factor proportional to :math:`1 + b_i t_i`. The combined forward model is: .. math:: \hat{\delta}_{g}(p) = \delta_{g}(p)\!\left(1 + \sum_{i=1}^{n_s} b_i\,t_i(p)\right) + \sum_{i=1}^{n_s} a_i\,t_i(p) where :math:`\delta_g(p) = n_g^{\rm true}(p)/\bar{n}_g - 1` is the true overdensity, :math:`\hat\delta_g(p)` is the observed overdensity, and :math:`t_i(p)` is systematic template :math:`i` normalised to zero mean and unit variance over the footprint. The amplitudes :math:`a_i` (additive) and :math:`b_i` (multiplicative) are the free parameters to be inferred. **Three nested models** are obtained by constraining these amplitudes: .. list-table:: :header-rows: 1 :widths: 20 20 22 38 * - Model - Free parameters - Constraint - Use case * - ``additive`` - :math:`a_i`, :math:`\sigma` - :math:`b_i = 0` - Stellar contamination, photometric bias * - ``multiplicative`` - :math:`a_i`, :math:`\sigma` - :math:`b_i = a_i` - Depth / completeness variations * - ``combined`` - :math:`a_i, b_i`, :math:`\sigma` - none - Unknown contamination type; general case **Inverting the model.** The clean overdensity can be recovered from the observed one via: .. math:: \delta_g(p) = \frac{\hat{\delta}_g(p) - \sum_i a_i\,t_i(p)} {1 + \sum_i b_i\,t_i(p)} This inversion is valid only when :math:`1 + \sum_i b_i t_i(p) > 0` in every pixel (i.e. the multiplicative modulation never entirely suppresses the galaxy density). For typical values :math:`|b_i| \lesssim 0.3` and unit-normalised templates this condition is satisfied almost everywhere. **Weight-based correction for linear methods.** For all additive-type methods (OLS, ElasticNet, ISD, MCMC-additive), the corrected field is computed as (Weaverdyck & Huterer 2020, Eq. 46-47): .. math:: \delta_g^{\rm corr}(p) = w(p)\,(1 + \hat{\delta}_g(p)) - 1, \qquad w(p) = \frac{1}{1 + \hat{\boldsymbol\alpha}\cdot\mathbf{t}(p)} This is exact for multiplicative contamination and equals the simpler subtraction :math:`\hat\delta_g - \hat{\boldsymbol\alpha}\cdot\mathbf{t}` at first order in :math:`\hat{\boldsymbol\alpha}\cdot\mathbf{t}`. Using the weight form preserves positivity (:math:`1 + \delta_g^{\rm corr} > 0`) and is the convention in the reference literature (Weaverdyck & Huterer 2020; Rodríguez-Monroy et al. 2025); it is applied uniformly throughout this package. For MCMC-combined the full inversion formula above is used instead. **Technical implementation.** All arithmetic is performed on **JAX arrays** (``jax.Array``) to enable JIT compilation and automatic differentiation. The forward and inverse models are implemented as pure functions with no mutable state: *Forward model* (:func:`~sys_mapping.contamination.apply_contamination`): .. code-block:: python mult_term = jnp.einsum("i,ij->j", b, delta_t) # Σ b_i t_i(p) add_term = jnp.einsum("i,ij->j", a, delta_t) # Σ a_i t_i(p) delta_g_obs = delta_g * (1.0 + mult_term) + add_term Both Einstein sums are single BLAS-1 operations of cost :math:`O(n_s \times N)`. At ``n_pix = 10\,000``, ``n_sys = 5`` the forward call runs in approximately **830 μs** after JIT warm-up (float32 accumulation; max absolute roundtrip error < 1e-5). *Inverse model* (:func:`~sys_mapping.contamination.invert_contamination`): adds an element-wise division (≈ **1 730 μs** at the same size). *Parameter vector layout* (:func:`~sys_mapping.contamination.pack_params`, :func:`~sys_mapping.contamination.unpack_params`): the flat ``float64`` vector fed to emcee / scipy optimisers has layout .. code-block:: text [a_0, …, a_{n_s-1}, ← always present b_0, …, b_{n_s-1}, ← combined model only sigma, ← always gamma] ← skew-normal only Slice operations preserve values to machine precision. The ``match``/``case`` dispatch (Python 3.10+) has zero runtime cost. **Functions:** :func:`~sys_mapping.contamination.apply_contamination` (forward model), :func:`~sys_mapping.contamination.invert_contamination` (recover clean field), :func:`~sys_mapping.contamination.pack_params` / :func:`~sys_mapping.contamination.unpack_params` (parameter vector layout), :func:`~sys_mapping.contamination.n_free_params`, :func:`~sys_mapping.contamination.compute_two_point_correction` ---- Gaussian and skew-normal likelihood ------------------------------------- **Reference:** `Berlfein et al. 2024 `_, Eq. 17–18. **Module:** :mod:`sys_mapping.likelihood` **Physical motivation.** To infer the contamination amplitudes :math:`(a_i, b_i)` from data, we need a likelihood model for the distribution of the clean overdensity field :math:`\delta_g(p)`. At the pixel level (with :math:`\bar{n} \gtrsim 30` galaxies per pixel), Poisson fluctuations in the count map are approximately Gaussian. The central limit theorem ensures that the overdensity in each pixel approaches a Gaussian as :math:`\bar{n} \to \infty`. **Gaussian likelihood.** Assuming :math:`\delta_g(p) \stackrel{\rm iid}{\sim} \mathcal{N}(0, \sigma^2)` for each unmasked pixel :math:`p`, the log-likelihood is (Eq. 17): .. math:: \ln\mathcal{L}(\boldsymbol\Theta) = -\frac{N}{2}\ln(2\pi\sigma^2) - \underbrace{\sum_p \ln\!\left|1 + \sum_i b_i\,t_i(p)\right|}_{\text{Jacobian of the change of variables}} - \frac{1}{2\sigma^2}\sum_p \underbrace{\left[\frac{\hat\delta_g(p) - \sum_i a_i t_i(p)}{1+\sum_i b_i t_i(p)}\right]^2}_{\delta_g(p)^2} Here :math:`\boldsymbol\Theta = (a_1,\ldots,a_{n_s},b_1,\ldots,b_{n_s},\sigma)`, :math:`N` is the number of unmasked pixels, and the Jacobian term accounts for the change of variable from :math:`\hat\delta_g` to :math:`\delta_g`. For the additive model (:math:`b_i=0`) the Jacobian term vanishes. **Skew-normal likelihood.** For a lognormal galaxy field, :math:`\delta_g = e^{G-\sigma_G^2/2}-1` has positive skewness :math:`s \approx 3\sigma_G + \sigma_G^3 > 0`. Fitting a Gaussian likelihood to a positively-skewed field introduces a systematic bias in the inferred amplitudes. The skew-normal extension (Eq. 18) corrects for this by adding a log-CDF term: .. math:: \ln\mathcal{L}_{\rm skew} = \ln\mathcal{L}_{\rm Gauss} + N\ln 2 + \sum_p \ln\Phi\!\left(\gamma\,\frac{\delta_g(p) - \xi}{\sigma}\right) where :math:`\Phi(z) = \tfrac{1}{2}[1+{\rm erf}(z/\sqrt{2})]` is the standard-normal CDF, :math:`\gamma > 0` is the skewness shape parameter (fitted alongside :math:`a_i, b_i, \sigma`), and :math:`\xi = -\sigma\delta\sqrt{2/\pi}` with :math:`\delta = \gamma/\sqrt{1+\gamma^2}` is a mean-shift that ensures :math:`\mathbb{E}[\delta_g]=0` is maintained. Setting :math:`\gamma=0` recovers the Gaussian exactly. **Technical implementation.** :func:`~sys_mapping.likelihood.make_log_likelihood` is a *factory*: it captures ``n_sys``, ``model``, and ``use_skewed`` as closed-over Python integers/booleans that become compile-time constants during JAX tracing. The returned function is decorated with ``@jax.jit`` and compiled on the first call (~**227 ms** compile latency; reuse the function object across all MCMC iterations). *Gaussian path* (``use_skewed=False``): .. code-block:: python mult_term = jnp.einsum("i,ij->j", b, delta_t) add_term = jnp.einsum("i,ij->j", a, delta_t) log_jac = jnp.sum(jnp.log(jnp.abs(1.0 + mult_term))) # Jacobian delta_g_clean = (delta_g_obs - add_term) / (1.0 + mult_term) log_gauss = -0.5*N*log(2πσ²) - (1/2σ²)*Σ delta_g_clean² return log_gauss - log_jac Execution time after JIT warm-up: **~284 μs/call** at :math:`N = 10\,000`, :math:`n_s = 5` on CPU. At zero contamination, matches ``scipy.stats.norm.logpdf`` sum to ``rtol = 1e-5``. *Skew-normal path* (``use_skewed=True``): .. code-block:: python delta_param = gamma / jnp.sqrt(1.0 + gamma**2) # δ = γ/√(1+γ²) xi = -sigma * delta_param * jnp.sqrt(2.0 / jnp.pi) residual = delta_g_clean - xi z = gamma * residual / sigma log_skew = N*log(2) + jnp.sum(log_ndtr(z)) # Σ log Φ(γ r/σ) return log_gauss + log_skew - log_jac The log-CDF :math:`\ln\Phi(z)` is evaluated via ``jax.scipy.special.log_ndtr``, which is numerically stable for large negative :math:`z` (where ``log(1+erf(z/√2))`` would underflow). Execution time: **~592 μs/call** (extra ``log_ndtr`` sum over :math:`N` pixels). Setting ``gamma = 0`` recovers the Gaussian to ``atol = 1e-6``. **Function:** :func:`~sys_mapping.likelihood.make_log_likelihood` ---- MCMC inference -------------- **Reference:** `Berlfein et al. 2024 `_, Sec. 4; `Goodman & Weare 2010 `_ (emcee sampler). **Module:** :mod:`sys_mapping.inference` **Physical motivation.** Given the likelihood :math:`\mathcal{L}(\boldsymbol\Theta)` and a prior :math:`\pi(\boldsymbol\Theta)`, Bayes' theorem gives the posterior: .. math:: p(\boldsymbol\Theta \mid \hat{\boldsymbol\delta}_g, \mathbf{T}) \propto \mathcal{L}(\boldsymbol\Theta)\,\pi(\boldsymbol\Theta) The prior on each :math:`a_i` and :math:`b_i` is a zero-mean Gaussian with unit variance, :math:`a_i \sim \mathcal{N}(0, 1)`, reflecting the expectation that contamination amplitudes are at most of order unity. The noise parameter :math:`\sigma > 0` has a half-normal prior. In the rotated template basis (see :ref:`Template PCA rotation`) the prior factorises across parameters, improving MCMC mixing. ``sys_mapping`` uses the **affine-invariant ensemble sampler** (`emcee `_, Foreman-Mackey et al. 2013). This sampler uses :math:`n_{\rm walkers}` parallel chains (walkers) that share information via stretch and walk moves. Key properties: * Works efficiently for posteriors with correlated parameters. * Automatically adapts to the posterior shape. * The only tuning parameter is the number of walkers (recommendation: :math:`n_{\rm walkers} \geq 4\,n_{\rm dim}`, typical choice :math:`n_{\rm walkers} = 150` for :math:`n_{\rm dim} = 7`). The MAP estimate is taken as the **posterior median** of the flat chain (after discarding the burn-in phase). The posterior variance is estimated from the chain covariance matrix. **Recommended settings:** .. list-table:: :header-rows: 1 :widths: 25 75 * - Parameter - Default / recommendation * - ``n_walkers`` - 150 (>= 4 x n_dim) * - ``n_steps`` - 1500 (beyond convergence) * - ``n_burn`` - 300 (discard early transient) * - Convergence check - ``emcee`` autocorrelation tau **Technical implementation.** *Log-probability wrapper* (:func:`~sys_mapping.inference.make_log_prob`): converts the JAX log-likelihood into an emcee-compatible Python callable. ``delta_g_obs`` and ``delta_t`` are converted to ``jnp.asarray`` once at construction time to avoid repeated host–device copies. A hard ``sigma <= sigma_min`` (default ``1e-6``) guard returns ``-inf`` without calling JAX, short-circuiting the dispatch overhead. *Vectorised mode* (``vectorize=True``): builds a single ``jax.jit(jax.vmap(log_likelihood))`` kernel that accepts the full ``(n_walkers, n_dim)`` position matrix and returns ``(n_walkers,)`` log-probabilities in one GPU dispatch per step. This is the primary source of GPU speedup and requires ``emcee >= 3.0``. *Walker initialisation* in :func:`~sys_mapping.inference.run_mcmc`: .. code-block:: python p0[:, :n_cont] = rng.normal(0.0, 1e-3, (n_walkers, n_cont)) p0[:, n_cont] = std(delta_g_obs) * (1 + rng.normal(0.0, 0.01, n_walkers)) # gamma (skew) initialised to N(0, 0.1) when use_skewed=True The :math:`10^{-3}` scatter in contamination amplitudes places walkers near zero (the prior centre) while providing enough diversity to explore the posterior within the first ~50 steps. ``sigma`` is initialised near the observed overdensity standard deviation, which is a good prior for the noise level. *Point estimate*: :func:`~sys_mapping.inference.get_mle_params` returns ``np.median(flat_chain, axis=0)`` — a robust estimator that is insensitive to outlier walkers and converges to the true parameter at :math:`O(1/\sqrt{n_{\rm samples}})`. *Covariance* (:func:`~sys_mapping.inference.get_param_covariance_from_chain`): slices the flat chain by column rather than unpacking sample by sample: .. code-block:: python a_arr = flat_chain[:, :n_sys] # (n_samples, n_sys) b_arr = flat_chain[:, n_sys : 2*n_sys] # combined only cov_a = np.cov(a_arr, rowvar=False) # (n_sys, n_sys) The full covariance (not just the diagonal) is needed to correctly propagate uncertainty through the PCA back-transformation ``cov_orig = R.T @ cov_rot @ R``. **Performance.** Total wall time scales as :math:`O(n_{\rm walkers} \times n_{\rm steps} \times N_{\rm pix} \times n_s)`. A typical run with ``n_walkers = 250``, ``n_steps = 1500``, :math:`N_{\rm pix} \approx 50\,000`, :math:`n_s = 5` takes a few minutes on CPU; the JAX likelihood is compiled once (~227 ms) then runs at ~284 μs/eval. **Functions:** :func:`~sys_mapping.inference.make_log_prob`, :func:`~sys_mapping.inference.run_mcmc`, :func:`~sys_mapping.inference.get_mle_params`, :func:`~sys_mapping.inference.get_param_variance_from_chain`, :func:`~sys_mapping.inference.get_param_covariance_from_chain` ---- .. _Template PCA rotation: Template PCA rotation ---------------------- **Reference:** `Berlfein et al. 2024 `_, Appendix A. **Module:** :mod:`sys_mapping.correction` **Physical motivation.** Systematic template maps are often correlated with each other (e.g. seeing correlates with depth because long exposures under good seeing produce deeper images). When templates are correlated, the contamination amplitudes :math:`a_i` cannot be determined independently — the likelihood has ridges in parameter space, MCMC chains mix slowly, and the inferred amplitudes in the original basis are unreliable. **PCA diagonalisation.** The template covariance matrix is: .. math:: C_{ij} = \frac{1}{N}\sum_{p=1}^N t_i(p)\,t_j(p) (a symmetric, positive semi-definite :math:`n_s \times n_s` matrix). Its eigendecomposition is :math:`C = V D V^\top` where :math:`V` is the :math:`n_s \times n_s` orthogonal matrix of eigenvectors and :math:`D = \mathrm{diag}(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_{n_s})` contains the eigenvalues (template variances in the rotated basis). The **rotated templates** are: .. math:: \mathbf{T}' = V^\top \mathbf{T}, \qquad t'_j(p) = \sum_i V_{ji}\,t_i(p) so that :math:`C' = V^\top C V = D` is diagonal. The rotated templates are orthogonal by construction: .. math:: \frac{1}{N}\sum_p t'_i(p)\,t'_j(p) = \lambda_i\,\delta_{ij} In the rotated basis, the likelihood factorises across parameters and MCMC convergence is faster. The rotation is applied *before* MCMC and the back-transformation is applied *after*: .. math:: a_i = (V\,\mathbf{a}')_i = \sum_j V_{ij}\,a'_j, \qquad b_i = (V\,\mathbf{b}')_i = \sum_j V_{ij}\,b'_j The eigenvalues :math:`\lambda_j` measure the variance of each rotated template and indicate how many independent systematic modes are present. Large gaps in the eigenvalue spectrum suggest that some templates are essentially redundant. **Technical implementation.** :func:`~sys_mapping.correction.rotate_templates` performs the following steps on a ``(n_sys, n_pix)`` NumPy array: .. code-block:: python cov = delta_t @ delta_t.T / n_pix # (n_sys, n_sys), O(n_sys² × n_pix) eigenvalues, eigenvectors = np.linalg.eigh(cov) # symmetric → faster than eig idx = np.argsort(eigenvalues)[::-1] # descending order rotation_matrix = eigenvectors[:, idx].T # V^T, rows are eigenvectors delta_t_rot = rotation_matrix @ delta_t # (n_sys, n_pix) ``numpy.linalg.eigh`` exploits the symmetry and positive semi-definiteness of the covariance matrix (LAPACK ``dsyevd`` driver), making it faster and more accurate than the general ``eig`` decomposition. Cost: :math:`O(n_s^3 + n_s \times N)`. At ``n_sys = 5``, ``n_pix = 10\,000`` the call takes approximately **177 μs**. Numerical guarantees: * Off-diagonal elements of the rotated covariance ``delta_t_rot @ delta_t_rot.T / n_pix`` are :math:`< 10^{-10}` (orthogonality). * Rotation matrix satisfies ``R @ R.T = I`` to ``atol < 1e-12``. * Sum of eigenvalues equals ``trace(cov)`` to ``rtol < 1e-10`` (variance conservation). :func:`~sys_mapping.correction.transform_params_from_rotated` applies the inverse transformation ``a_orig = V @ a_rot`` where ``V = R.T``: .. code-block:: python V = rotation_matrix.T # (n_sys, n_sys) a_orig = V @ a_rot # two small (n_sys, n_sys) × (n_sys,) products b_orig = V @ b_rot Cost: two :math:`n_s \times n_s` matrix–vector multiplies (**~4 μs**). Roundtrip precision (rotate → infer → back-transform) is ``atol < 1e-12`` in float64. In :func:`~sys_mapping.regression.run_decontamination` (MCMC methods), the covariance matrix is propagated back as: .. code-block:: python cov_a = R.T @ cov_a_rot @ R # (n_sys, n_sys) sandwich product so that uncertainty in the rotated basis is correctly mapped to the original template basis without approximation. **Functions:** :func:`~sys_mapping.correction.rotate_templates`, :func:`~sys_mapping.correction.transform_params_from_rotated` ---- Noise debiasing --------------- **Reference:** `Berlfein et al. 2024 `_, Eq. 21. **Module:** :mod:`sys_mapping.correction` **Physical motivation.** When the true contamination amplitude :math:`a_i` is small (close to zero), the posterior median :math:`\hat{a}_i` fluctuates around zero due to noise. The squared amplitude :math:`\hat{a}_i^2` is therefore always positive even when the true amplitude is zero, and the expectation is: .. math:: \mathbb{E}[\hat{a}_i^2] = a_i^2 + \mathrm{Var}[\hat{a}_i] This is the classical *noise bias* familiar from power spectrum estimation (where the noise power spectrum must be subtracted from the measured pseudo- :math:`C_\ell`). **Debiased estimator.** The noise-debiased squared amplitude is: .. math:: \tilde{a}_i^2 = \max\!\bigl(\hat{a}_i^2 - \mathrm{Var}[\hat{a}_i],\; 0\bigr) and analogously: .. math:: \tilde{b}_i^2 = \max\!\bigl(\hat{b}_i^2 - \mathrm{Var}[\hat{b}_i],\; 0\bigr) The :math:`\max(\cdot, 0)` clip enforces the physical constraint :math:`a_i^2 \geq 0` (squared amplitudes are non-negative). When the true amplitude is zero, the expected debiased estimate is zero on average. The debiased amplitudes :math:`\tilde{a}_i^2, \tilde{b}_i^2` are the inputs to the two-point correction formula (see below). Using the raw (biased) squared amplitudes would over-subtract the template auto-correlation, introducing a spurious dip in the corrected :math:`w(\theta)`. **Technical implementation.** :func:`~sys_mapping.correction.debias_params` is a pure NumPy function operating on ``(n_sys,)`` float64 arrays: .. code-block:: python a_sq = np.maximum(a_hat**2 - var_a, 0.0) # clip at 0 (squared amp ≥ 0) b_sq = np.maximum(b_hat**2 - var_b, 0.0) Cost: two element-wise squares, two subtractions, two ``np.maximum`` calls — approximately **4 μs** regardless of :math:`n_s`. When the exact posterior variance is supplied, the debiased estimate recovers the true :math:`a_i^2` to ``atol < 1e-10`` (float64 round-off limited). The ``var_a`` and ``var_b`` inputs are the *diagonal* of the MCMC posterior covariance returned by :func:`~sys_mapping.inference.get_param_variance_from_chain`. In the PCA-rotated basis the off-diagonal covariance elements are small (templates are orthogonal after rotation), so the diagonal approximation is accurate. **Function:** :func:`~sys_mapping.correction.debias_params` ---- Two-point function correction ------------------------------ **Reference:** `Berlfein et al. 2024 `_, Eq. 15–16. **Module:** :mod:`sys_mapping.correction` **Derivation sketch.** The angular two-point correlation function of the observed field is: .. math:: \hat{w}(\theta) = \bigl\langle \hat\delta_g(\hat{n})\,\hat\delta_g(\hat{n}')\bigr\rangle_\theta Inserting the contamination model and expanding to first order in :math:`a_i^2` and :math:`b_i^2` (all cross-terms between different templates :math:`i \neq j` vanish on average because templates are uncorrelated with :math:`\delta_g` and with each other after normalisation): .. math:: \hat{w}(\theta) \approx w_g(\theta) + \sum_i b_i^2\,w_g(\theta)\,w_{t_i t_i}(\theta) + \sum_i a_i^2\,w_{t_i t_i}(\theta) = w_g(\theta)\!\left[1 + \sum_i b_i^2\,w_{t_i t_i}(\theta)\right] + \sum_i a_i^2\,w_{t_i t_i}(\theta) where :math:`w_{t_i t_i}(\theta) = \langle t_i(\hat{n})\,t_i(\hat{n}')\rangle_\theta` is the template auto-correlation at angular separation :math:`\theta`. Solving for :math:`w_g(\theta)` and replacing :math:`a_i^2 \to \tilde{a}_i^2`, :math:`b_i^2 \to \tilde{b}_i^2` (debiased estimates): .. math:: \hat{w}_{\rm corr}(\theta) = \frac{\hat{w}(\theta) - \sum_i \tilde{a}_i^2\,w_{t_i t_i}(\theta)} {1 + \sum_i \tilde{b}_i^2\,w_{t_i t_i}(\theta)} **When to use.** Apply this correction *after* measuring the raw :math:`w(\theta)` from the galaxy catalog and the template auto-correlations :math:`w_{t_i t_i}(\theta)` from the templates. The correction requires the debiased amplitudes from :func:`~sys_mapping.correction.debias_params`. The corrected :math:`\hat{w}_{\rm corr}(\theta)` is an unbiased estimate of :math:`w_g(\theta)` to first order in the contamination amplitudes. For the harmonic-space (power spectrum) equivalent, see :func:`~sys_mapping.correction.correct_power_spectrum_harmonic`. **Technical implementation.** :func:`~sys_mapping.contamination.compute_two_point_correction` operates on JAX arrays and is the inner kernel: .. code-block:: python add_bias = jnp.einsum("i,ij->j", a_sq, template_correlations) # (n_bins,) mult_bias = jnp.einsum("i,ij->j", b_sq, template_correlations) # (n_bins,) w_corr = (w_obs - add_bias) / (1.0 + mult_bias) Two Einstein sums over ``(n_sys, n_bins)`` produce ``(n_bins,)`` vectors; cost is :math:`O(n_s \times n_{\rm bins})` — negligible. Execution time: approximately **576 μs** at ``n_sys = 5``, ``n_bins = 15`` (dominated by JAX dispatch overhead). Matches the analytic formula to ``rtol = 1e-6`` (single float32 einsum). The convenience wrapper :func:`~sys_mapping.correction.correct_two_point_function` chains the NumPy debiasing step with the JAX correction kernel and converts the result back to a NumPy array: .. code-block:: python a_sq, b_sq = debias_params(a_hat, b_hat, var_a, var_b) # NumPy, ~4 μs w_corr = compute_two_point_correction( # JAX, ~576 μs jnp.asarray(w_obs), jnp.asarray(a_sq), jnp.asarray(b_sq), jnp.asarray(template_correlations)) return np.asarray(w_corr) # back to NumPy The ``template_correlations`` array (shape ``(n_sys, n_bins)``) is the auto-correlation :math:`w_{t_i t_i}(\theta)` of each template map, measured with the same angular binning as the galaxy :math:`w(\theta)`. **Functions:** :func:`~sys_mapping.correction.correct_two_point_function`, :func:`~sys_mapping.correction.correct_power_spectrum_harmonic` ---- Likelihood ratio test (model selection) ----------------------------------------- **Reference:** `Berlfein et al. 2024 `_, Eq. 19; Wilks 1938 (asymptotic theorem). **Module:** :mod:`sys_mapping.model_selection` **Physical motivation.** Given data, we want to know whether the multiplicative contamination terms :math:`b_i` are statistically necessary, or whether the simpler additive model with :math:`b_i = 0` provides an adequate description. The LRT is the standard frequentist tool for comparing nested models. **The LRT statistic.** For two nested models :math:`\mathcal{M}_0 \subset \mathcal{M}_1` with MLEs :math:`\hat{\Theta}_0` and :math:`\hat{\Theta}_1`: .. math:: \lambda_{\rm LR} = 2\!\left[\ln\mathcal{L}(\hat{\Theta}_1) - \ln\mathcal{L}(\hat{\Theta}_0)\right] \;\xrightarrow{N\to\infty}\; \chi^2(r), \quad r = k_1 - k_0 where :math:`k_0, k_1` are the numbers of free parameters in each model. For the additive (:math:`n_s + 1`) vs combined (:math:`2n_s + 1`) comparison, :math:`r = n_s`. **Interpretation.** A large :math:`\lambda_{\rm LR} \gg \chi^2_{r,0.95}` (the 95th percentile of the :math:`\chi^2(r)` distribution) means the combined model is strongly preferred — the data require multiplicative contamination. A small :math:`\lambda_{\rm LR} \approx 0` means the simpler additive model is sufficient. **Note on the sign convention.** The LRT statistic is always :math:`\lambda_{\rm LR} \geq 0` because adding parameters cannot decrease the maximised log-likelihood. If the two likelihoods are nearly equal, :math:`\lambda_{\rm LR} \approx 0` and the data cannot distinguish the models. **Technical implementation.** The LRT uses the same MCMC posterior median as the point estimate for both models. The test statistic is computed by evaluating :func:`~sys_mapping.likelihood.make_log_likelihood` at the two parameter vectors and differencing: .. code-block:: python lrt = 2.0 * (log_lik_combined(theta_1) - log_lik_additive(theta_0)) Both log-likelihood calls use the JIT-compiled function objects, so the evaluation cost is two forward passes through the likelihood (~284 μs each). The p-value is obtained from the :math:`\chi^2(r)` CDF via ``scipy.stats.chi2.sf(lrt, df=r)`` where ``r = n_sys`` for the additive-vs-combined comparison. Because the MCMC posterior median — not a true MLE from gradient optimisation — is used as the parameter estimate, the LRT statistic is a conservative approximation: the posterior median is inside the posterior but may not be exactly at the likelihood maximum, so :math:`\lambda_{\rm LR}` may be slightly underestimated. In practice, for well-sampled posteriors (``n_steps >= 1500``) the bias is negligible. **Function:** :func:`~sys_mapping.model_selection.likelihood_ratio_test` ---- Pseudo-:math:`C_\ell` power spectrum -------------------------------------- **References:** `Elsner, Leistedt & Peiris 2016 `_ (MNRAS 456, 2095), Sec. 2–3; `Elsner, Leistedt & Peiris 2017 `_ (MNRAS 465, 1847); `Leistedt & Peiris 2014 `_ (MNRAS 444, 2); `Ho et al. 2012 `_ (ApJ 761, 14). **Module:** :mod:`sys_mapping.power_spectrum` **Physical motivation.** Instead of working with the real-space two-point function :math:`w(\theta)`, one can characterise galaxy clustering in harmonic space via the angular power spectrum :math:`C_\ell`. Each multipole :math:`\ell` probes a characteristic angular scale :math:`\theta \sim \pi/\ell` (e.g. :math:`\ell = 100` corresponds to :math:`\approx 1.8°`). Harmonic-space estimators allow easy comparison with theoretical predictions computed via Boltzmann codes (CAMB, CLASS). **Pseudo-:math:`C_\ell` estimator.** Because the galaxy survey covers only a fraction of the sky, spherical harmonic modes are mixed by the survey mask :math:`W(p)`. The *pseudo-:math:`C_\ell`* estimator simply applies the mask and computes the power spectrum of the masked map: .. math:: \tilde{C}_\ell = \frac{1}{2\ell+1}\sum_{m=-\ell}^{\ell} \left|\sum_p W(p)\,\hat\delta_g(p)\,Y_{\ell m}^*(p)\,\Omega_p\right|^2 where :math:`Y_{\ell m}(p)` are the spherical harmonic basis functions and :math:`\Omega_p = 4\pi/N_{\rm pix}` is the pixel solid angle. In ``sys_mapping`` this is computed by passing the masked map to ``healpy.anafast``. The pseudo-:math:`C_\ell` is related to the true power spectrum :math:`C_\ell` via the **MASTER equation** (Mode-coupling Matrix in Coupled Harmonics for Angular Spectra Techniques, Hivon et al. 2002): .. math:: \langle\tilde{C}_\ell\rangle = \sum_{\ell'} M_{\ell\ell'}\,C_{\ell'} where the mode-coupling matrix :math:`M_{\ell\ell'}` depends only on the mask power spectrum :math:`\tilde{C}_\ell^{WW}`. **Template subtraction (TS)** in harmonic space (Elsner+2016, Eq. 1–2). Systematic contamination adds a spurious contribution to the measured pseudo-:math:`C_\ell`. For an additive contaminant with amplitude :math:`\hat\alpha_i`, the cleaned power spectrum is: .. math:: \tilde{C}_\ell^{\rm TS} = \hat{C}_\ell^{dd} - \sum_i \hat\alpha_i\,\hat{C}_\ell^{t_i t_i} where :math:`\hat{C}_\ell^{dd}` is the measured pseudo-:math:`C_\ell` of the galaxy map and :math:`\hat{C}_\ell^{t_i t_i}` is the pseudo-:math:`C_\ell` of template :math:`i`. **Harmonic bias** (Elsner+2016, Eq. 8). Template subtraction introduces a residual additive bias: .. math:: b_\ell = -\frac{n_s}{2\ell+1} This arises because subtracting a linear combination of templates removes :math:`n_s` degrees of freedom from the sky, reducing the effective number of modes at each :math:`\ell` by :math:`n_s`. At low multipoles (:math:`\ell \lesssim 10`) and with many templates (:math:`n_s \gtrsim 10`) this bias can be significant and must be corrected. **Mode Projection (BMP/EMP)**. An alternative to template subtraction is to *project out* the template modes from the covariance matrix before power spectrum estimation. In the **Basic Mode Projection** (BMP), the pixel-pixel covariance matrix is modified to marginalise over the template amplitudes. The **Extended Mode Projection** (EMP, Leistedt+2014) additionally discards multipoles where the template signal exceeds a threshold :math:`k` times the noise, with an analytical bias correction. **Technical implementation.** *Pseudo-Cℓ measurement* (:func:`~sys_mapping.power_spectrum.measure_pseudo_cl`): .. code-block:: python masked_map = delta_g * mask_float # element-wise multiplication cl = hp.anafast(masked_map, lmax=lmax, use_pixel_weights=True) # healpy spherical harmonic transform ell = np.arange(len(cl)) The mask is applied as a float weight map (``True`` → 1.0, ``False`` → 0.0). ``use_pixel_weights=True`` enables healpy's pixel-area weighting scheme, which corrects for the geometric distortion of the HEALPix equal-area pixels at high Galactic latitude and improves accuracy at :math:`\ell \gtrsim N_{\rm side}`. The returned ``pseudo_cl`` is the power of the masked map and must be divided by :math:`W_2 = N_{\rm pix}^{-1}\sum_p W^2(p)` (the mask power) for a first-order sky-fraction correction. *Template subtraction* (:func:`~sys_mapping.power_spectrum.subtract_template_cl`): calls ``hp.anafast`` once per template to compute :math:`\tilde{C}_\ell^{t_i t_i}`, then subtracts the weighted contribution: .. code-block:: python cl_cleaned = pseudo_cl.copy() for a_i, t_i in zip(alpha, delta_t): masked_template = t_i * mask_float cl_t = hp.anafast(masked_template, lmax=lmax, use_pixel_weights=True) cl_cleaned -= a_i * cl_t Cost: :math:`O(n_s)` harmonic transforms. Each transform has complexity :math:`O(N_{\rm pix} \log N_{\rm pix})` (the ``healpy`` SHT). *Harmonic bias correction* (:func:`~sys_mapping.power_spectrum.harmonic_bias`): a closed-form expression requiring no iteration: .. code-block:: python bias = -n_templates / (2.0 * ell + 1.0) # (n_ell,) *Mode Projection* (:func:`~sys_mapping.power_spectrum.mode_projection_bias`): both BMP and EMP use the same ``harmonic_bias`` formula; EMP additionally applies a per-multipole mask based on the template-to-signal ratio: .. code-block:: python template_ratio = |harmonic_bias(1, ell)| / (cl_signal / n_templates) project_mask = template_ratio > threshold n_projected = project_mask.sum() bias = np.where(project_mask, harmonic_bias(n_projected, ell), 0.0) When ``coupling_matrix`` is not ``None``, the scalar bias vector is propagated through the mode-coupling matrix :math:`b_\ell^{\rm pseudo} = M_{\ell\ell'} b_{\ell'}` before subtraction. Passing ``coupling_matrix=None`` uses the identity (full-sky approximation). **Functions:** :func:`~sys_mapping.power_spectrum.measure_pseudo_cl`, :func:`~sys_mapping.power_spectrum.subtract_template_cl`, :func:`~sys_mapping.power_spectrum.harmonic_bias`, :func:`~sys_mapping.power_spectrum.mode_projection_bias` ---- Block bootstrap covariance --------------------------- **References:** `Berlfein et al. 2024 `_, Sec. 6.2; `Ross et al. 2011 `_. **Module:** :mod:`sys_mapping.bootstrap` **Physical motivation.** Galaxy overdensity maps have spatially correlated structure on scales of tens to hundreds of Mpc (the same clustering signal we are trying to measure). This means that the uncertainty in any estimator computed from these maps cannot be estimated by treating pixels as independent — the effective number of degrees of freedom is far smaller than the pixel count. A *spatial block bootstrap* preserves the spatial covariance structure by resampling *patches* rather than individual pixels. **Algorithm.** The survey footprint is divided into :math:`K` approximately equal-area spatial patches by coarsening to a lower-resolution HEALPix grid. Each bootstrap resample draws :math:`K` patches with replacement, then evaluates the estimator :math:`\hat\theta` on the resampled dataset. The variance estimate is: .. math:: \widehat{\rm Var}[\hat\theta] = \frac{1}{B-1}\sum_{b=1}^B \bigl(\hat\theta_b - \bar{\hat\theta}\bigr)^2, \qquad \bar{\hat\theta} = \frac{1}{B}\sum_{b=1}^B \hat\theta_b where :math:`B` is the number of bootstrap resamples (typical: :math:`B = 100–500`). **Typical settings.** Use :math:`K \approx 10–30` patches and :math:`B \approx 100–200` resamples. Fewer patches reduce the spatial resolution of the covariance estimate; more patches increase computational cost linearly. **Technical implementation.** *Patch assignment* (:func:`~sys_mapping.bootstrap._assign_patches`): the footprint is partitioned by coarsening to a lower HEALPix NSIDE: .. code-block:: python nside_patch = max(1, nside // int(sqrt(nside2npix(nside) / n_patches))) theta, phi = hp.pix2ang(nside, pixel_indices) # unmasked pixels only coarse_pix = hp.ang2pix(nside_patch, theta, phi) patch_ids = np.unique(coarse_pix, return_inverse=True)[1] # 0..K-1 This ensures patches are approximately equal area (HEALPix is equal-area at each resolution) and spatially contiguous, preserving large-scale galaxy clustering modes within each patch. The coarse NSIDE is chosen so that there are :math:`\approx n_{\rm patches}` unique coarse pixels. *Bootstrap loop* (:func:`~sys_mapping.bootstrap.block_bootstrap_variance`): .. code-block:: python for _ in range(n_bootstrap): sampled_patches = rng.choice(unique_patches, size=K, replace=True) pixel_mask = np.concatenate( [np.where(patch_ids == p)[0] for p in sampled_patches]) dg_boot = delta_g_obs[pixel_mask] dt_boot = delta_t[:, pixel_mask] bootstrap_estimates.append(estimator(dg_boot, dt_boot)) variance = np.var(np.stack(bootstrap_estimates), axis=0, ddof=1) Resampling ``K`` patches with replacement means that some patches are included multiple times and others are omitted, which mimics the sampling variance of the survey area. The ``ddof=1`` divisor gives an unbiased estimate of the variance of the estimator. **Performance.** At ``n_boot = 50``, ``n_patches = 8``, NSIDE 32, the entire bootstrap loop takes approximately **32 ms** for a simple mean estimator. Total cost scales linearly as :math:`O(n_{\rm boot} \times \text{cost}(\hat\theta))`. **Function:** :func:`~sys_mapping.bootstrap.block_bootstrap_variance` ---- Jack-knife covariance ---------------------- **References:** `Ross et al. 2011 `_ (MNRAS 417, 1350), Sec. 4.3; `Ho et al. 2012 `_ (ApJ 761, 14). **Module:** :mod:`sys_mapping.bootstrap` **Physical motivation.** The jack-knife is a *deterministic* leave-one-out estimator that does not require random resampling. It is more stable than the bootstrap for a fixed number of patches because it uses all :math:`K` leave-one-out datasets rather than a random subset. **Algorithm.** For each of the :math:`K` patches, remove that patch and evaluate the estimator on the remaining :math:`K-1` patches. The jack-knife covariance matrix is: .. math:: \widehat{\rm Cov}[\hat\theta] = \frac{K-1}{K} \sum_{k=1}^K \bigl(\hat\theta_{(-k)} - \bar{\hat\theta}\bigr) \bigl(\hat\theta_{(-k)} - \bar{\hat\theta}\bigr)^\top where :math:`\hat\theta_{(-k)}` is the estimator with patch :math:`k` removed, and :math:`\bar{\hat\theta} = K^{-1}\sum_k \hat\theta_{(-k)}` is the mean over jack-knife samples. The prefactor :math:`(K-1)/K` ensures the estimator is unbiased for the true covariance in the limit of large :math:`K`. **Comparison with block bootstrap.** For a well-designed spatial decomposition, the jack-knife and block bootstrap covariances should agree to within :math:`\sim 20\%`. The jack-knife is faster (deterministic, :math:`K` estimator calls vs :math:`B` for bootstrap) but more sensitive to the choice of patch geometry. The bootstrap can capture asymmetric distributions that the jack-knife misses. **Technical implementation.** The jack-knife uses the same ``_assign_patches`` helper as the bootstrap, so patch geometry is identical. The loop is fully deterministic: .. code-block:: python leave_one_out = [] for k in unique_patches: keep = patch_ids != k # boolean mask estimate_k = estimator(delta_g_obs[keep], delta_t[:, keep]) leave_one_out.append(estimate_k) estimates = np.stack(leave_one_out) # (K, n_out) mean_est = estimates.mean(axis=0) residuals = estimates - mean_est cov = (K-1)/K * (residuals.T @ residuals) The outer product ``residuals.T @ residuals`` computes the full ``(n_out, n_out)`` covariance matrix in one BLAS-3 call. For scalar estimators (``n_out = 1``) this reduces to a sum of squared deviations. A ``UserWarning`` is issued if ``n_patches < 5`` (the covariance estimate is unreliable when few patches are available). **Cost**: :math:`K` evaluations of the estimator (compared to :math:`B` for bootstrap, with :math:`K \ll B` typically). For ``K = 10`` patches the jack-knife is 10–20× faster than 100–200 bootstrap resamples. **Function:** :func:`~sys_mapping.bootstrap.jackknife_covariance` ---- ElasticNet regression ---------------------- **Reference:** `Weaverdyck & Huterer 2021 `_ (MNRAS 503, 5061), Eq. 9–10. **Module:** :mod:`sys_mapping.regression` **Physical motivation.** When there are many systematic templates (:math:`n_s \gg 1`) and only a few carry genuine contamination, standard OLS tends to over-fit: it assigns non-zero amplitudes to all templates, including those that are noise-dominated. Regularised regression shrinks unimportant amplitudes toward zero while retaining the dominant ones. **ElasticNet loss function.** ElasticNet minimises: .. math:: \mathcal{L}(\boldsymbol\alpha) = \frac{1}{2N} \left\|\hat{\boldsymbol\delta}_g - \mathbf{T}^\top\boldsymbol\alpha\right\|_2^2 + \underbrace{\lambda_1\|\boldsymbol\alpha\|_1}_{\rm Lasso\;(L1)} + \underbrace{\lambda_2\|\boldsymbol\alpha\|_2^2}_{\rm Ridge\;(L2)} where :math:`\boldsymbol\alpha = (a_1,\ldots,a_{n_s})^\top` are the additive contamination amplitudes, :math:`\mathbf{T}` is the :math:`n_s \times N` template matrix, and the regularisation strengths satisfy :math:`\lambda_1 = \lambda\,\rho` and :math:`\lambda_2 = \lambda(1-\rho)/2` for a total regularisation :math:`\lambda` and L1 fraction :math:`\rho \in [0,1]` (``l1_ratio`` in the code). The L1 term (**Lasso** regularisation) promotes sparsity — small amplitudes are shrunk exactly to zero, automatically selecting the dominant templates. The L2 term (**Ridge** regularisation) stabilises correlated templates by sharing the coefficient equally. The ElasticNet combination is preferable when templates are both numerous and correlated. When ``alpha_reg=None`` the optimal :math:`\lambda` is found by :math:`k`-fold cross-validation on the pixel grid. **Pixel weights.** After fitting, systematic weights are assigned as: .. math:: w(p) = \frac{1}{\max\!\left(1 + \hat{\boldsymbol\alpha}\cdot\mathbf{t}(p),\;\epsilon\right)} with :math:`\epsilon = 10^{-6}` to prevent division by zero. **Requires:** ``scikit-learn >= 1.3`` (``pip install sys_mapping[regression]``) **Technical implementation.** :func:`~sys_mapping.regression.elasticnet_contamination_fit` transposes the template matrix so that scikit-learn receives the standard design matrix convention (samples × features): .. code-block:: python X = delta_t.T # (n_pix, n_sys) — samples × features for sklearn y = delta_g_obs # (n_pix,) When ``alpha_reg=None`` (auto-tune), ``sklearn.linear_model.ElasticNetCV`` runs ``cv_folds``-fold cross-validation on the pixel grid. The pixel grid is split into contiguous chunks of roughly :math:`N/k` pixels each (scikit-learn's default stratified-free split), so folds may mix spatially. For strongly spatially correlated overdensity maps this can underestimate the optimal :math:`\lambda`; in practice the effect is small relative to the statistical noise. The CV-selected ``alpha_`` is stored in ``cv_info``. .. code-block:: python model = ElasticNetCV(l1_ratio=l1_ratio, cv=cv_folds, n_jobs=-1) model.fit(X, y) alpha_reg = float(model.alpha_) # optimal λ from CV cv_scores = model.mse_path_.mean(axis=-1) # mean MSE vs λ grid The fitted coefficients are extracted as ``model.coef_`` (shape ``(n_sys,)``). Per-pixel weights are then computed and clipped to :math:`[1/20, 20]` to prevent numerical blow-up in pixels where :math:`1 + \hat{\boldsymbol\alpha}\cdot\mathbf{t}(p) \approx 0`: .. code-block:: python denominator = np.maximum(1.0 + alpha_hat @ delta_t, 1e-6) weights = np.clip(1.0 / denominator, 1.0/20.0, 20.0) The clip bound ``_ISD_MAX_WEIGHT = 20`` is shared with the ISD pipeline; it corresponds to a denominator floor of 0.05 — generous enough not to bias well-behaved scenarios. **Functions:** :func:`~sys_mapping.regression.elasticnet_contamination_fit`, :func:`~sys_mapping.regression.method_comparison` ---- Iterative Systematics Decontamination (ISD) -------------------------------------------- **References:** `Rodríguez-Monroy et al. 2025 `_; `Rezaie et al. 2020 `_. **Module:** :mod:`sys_mapping.regression` **Physical motivation.** Standard OLS fits a linear model :math:`\hat\delta_g \approx \sum_i \alpha_i t_i`. When contamination is multiplicative, the correct model is :math:`\hat\delta_g \approx \delta_g(1 + \sum_i b_i t_i)`, which is non-linear in the templates. ISD approximates this non-linearity by expanding the template basis with polynomial cross-terms and iterating the OLS solution. **Algorithm.** Starting with the original templates :math:`\{t_i\}_{i=1}^{n_s}`, the expanded basis at polynomial order :math:`d` includes all products :math:`t_i^{k_1} t_j^{k_2}\cdots` with :math:`k_1 + k_2 + \cdots \leq d`. At order :math:`d=1` this is just the original templates (standard OLS). At order :math:`d=3` the cross terms :math:`t_i t_j` (for :math:`i \leq j`) and triple products :math:`t_i t_j t_k` are added. For :math:`n_s` templates the expanded basis has :math:`\binom{n_s + d}{d} - 1` columns (e.g. 55 columns for :math:`n_s = 5, d = 3`). The iterative loop: 1. Form the weighted normal equations using per-pixel weights :math:`w^{(k)}(p)`: .. math:: \bigl(\mathbf{X}^\top \mathbf{W}^{(k)} \mathbf{X} + \mathbf{\Lambda}\bigr) \hat{\boldsymbol\alpha}^{(k)} = \mathbf{X}^\top \mathbf{W}^{(k)} \hat{\boldsymbol\delta}_g where :math:`\mathbf{X}` is the :math:`(N_{\rm pix} \times N_{\rm exp})` expanded design matrix (constant across iterations), :math:`\mathbf{W}^{(k)} = \mathrm{diag}(w^{(k)})`, and :math:`\mathbf{\Lambda}` is an optional ridge penalty matrix (see the ridge regularisation note below). 2. Update weights from the **linear coefficients only** (first :math:`n_s` entries of :math:`\hat{\boldsymbol\alpha}^{(k)}`): .. math:: w^{(k+1)}(p) = \frac{1}{1 + \hat{\boldsymbol\alpha}^{(k)}_{\rm lin} \cdot \mathbf{t}(p)} 3. Repeat until convergence: :math:`\|\mathbf{w}^{(k+1)} - \mathbf{w}^{(k)}\|_2 / \|\mathbf{w}^{(k)}\|_2 < \varepsilon = 10^{-5}`. The polynomial cross-terms improve the linear coefficient estimates but are not propagated into the weight formula, which keeps the same form as plain OLS. The cleaned field is recovered as :math:`\hat\delta_g^{\rm clean}(p) = w(p)(1 + \hat\delta_g^{\rm obs}(p)) - 1`. .. _ISD normal equations speedup: .. note:: **Speed (v0.2.4).** Since :math:`\mathbf{X}` is constant across iterations, it is precomputed once outside the loop as :math:`\mathbf{X}^\top` (shape :math:`N_{\rm exp} \times N_{\rm pix}`). The per-iteration cost is then one BLAS-3 matrix multiply .. math:: \mathbf{X}^\top \mathbf{W}^{(k)} \mathbf{X} = \underbrace{(\mathbf{X}^\top \odot \mathbf{w}^{(k)})}_{\text{broadcast scale}} \mathbf{X} producing an :math:`(N_{\rm exp} \times N_{\rm exp})` matrix — tiny regardless of :math:`N_{\rm pix}` — followed by ``lstsq`` on this small system. This replaces the previous approach of computing the SVD of the full :math:`(N_{\rm pix} \times N_{\rm exp})` matrix at each iteration, giving a **5–15× speed-up** per iteration for realistic pixel counts (:math:`N_{\rm pix} \gtrsim 10^4`). .. _ISD-3 ridge regularisation: .. note:: **ISD-3 accuracy: ridge regularisation on polynomial columns (v0.2.4).** A key source of ISD-3 instability is a *fixed-point inconsistency*: the regression minimises residuals over all :math:`N_{\rm exp}` columns (including polynomial cross-terms), but only the first :math:`n_s` linear coefficients drive the weight update. When the polynomial columns are correlated with the linear ones — which is always the case for survey systematics — they absorb a share of the linear signal. The linear coefficients :math:`\hat{\boldsymbol\alpha}_{\rm lin}` are left systematically small, and the iteration converges to a fixed point where the contamination has not actually been removed. The fix is a ridge penalty :math:`\lambda_{\rm poly}` applied *exclusively* to the polynomial-only columns (indices :math:`n_s:` of the expanded basis): .. math:: \mathbf{\Lambda} = \mathrm{diag}\!\underbrace{(0,\ldots,0}_{n_s}, \underbrace{\lambda_{\rm poly},\ldots,\lambda_{\rm poly}}_{N_{\rm exp}-n_s}) The linear columns are **never** penalised, so the weight update is not biased. The penalty suppresses the polynomial terms just enough to prevent them from over-fitting noise or absorbing linear signal. **Recommended value:** :math:`\lambda_{\rm poly} \approx 10^{-3}\,{\rm Var}(\hat\delta_g)`. This is small enough to leave the result unaffected when the polynomial terms are genuinely needed, and large enough to stabilise ISD-3 when templates are strongly correlated. Exposed as the ``lambda_poly`` argument to :func:`~sys_mapping.regression.iterative_systematics_decontamination` and as ``isd_lambda_poly`` in :func:`~sys_mapping.regression.run_decontamination`. Default is ``0.0`` (plain OLS, backward-compatible). **Convergence.** ISD-1 reliably converges within ≲ 25 iterations for all tested configurations. ISD-3 with ``lambda_poly = 0`` may still oscillate when templates are strongly correlated; setting ``lambda_poly = 1e-3 * np.var(delta_g_obs)`` resolves this in practice. In the worst case the function returns weights at ``max_iter``, which remain a useful (if not fully converged) correction. **When to use.** * **ISD-1** (``poly_order=1``): equivalent to OLS; fastest option; use as a baseline. * **ISD-3** (``poly_order=3``): partially corrects multiplicative contamination without a full MCMC run. Set ``isd_lambda_poly = 1e-3 * np.var(delta_g_obs)`` when ISD-3 produces implausible solutions or fails to converge. * **MCMC-combined**: preferred for high-precision analyses with known multiplicative contamination — it jointly infers additive and multiplicative amplitudes with full posterior uncertainties. **Technical implementation — polynomial expansion.** The expanded template basis is built with ``itertools.combinations_with_replacement``, which generates all multi-indices of degree :math:`\leq d` in lexicographic order: .. code-block:: python rows = [delta_t] # (n_sys, n_pix) — linear terms always first for deg in range(2, poly_order + 1): for combo in combinations_with_replacement(range(n_sys), deg): product = ones(n_pix) for idx in combo: product *= delta_t[idx] rows.append(product[newaxis, :]) delta_t_expanded = vstack(rows) # (n_expanded, n_pix) For ``n_sys = 5``, ``poly_order = 3`` this yields :math:`\binom{5+3}{3} - 1 = 55` columns. If ``n_pix < 10 × n_expanded`` the system would be poorly conditioned; ``poly_order`` is reduced automatically with a ``UserWarning``. **Technical implementation — weighted normal equations.** The design matrix ``X = delta_t_expanded.T`` (shape ``(n_pix, n_expanded)``) is computed *once* before the iteration loop. Each iteration updates only the diagonal weight matrix: .. code-block:: python Xt = delta_t_expanded # (n_expanded, n_pix), constant X = delta_t_expanded.T # (n_pix, n_expanded), constant ridge_diag = zeros(n_expanded) ridge_diag[n_sys:] = lambda_poly # penalty on polynomial columns only for iteration in range(1, max_iter + 1): Xw = Xt * weights # (n_expanded, n_pix): broadcast scale XtWX = Xw @ X # (n_expanded, n_expanded): BLAS-3 XtWX.flat[::n_expanded+1] += ridge_diag # in-place diagonal add XtWy = Xw @ delta_g_obs # (n_expanded,): BLAS-1 alpha_hat_all, *_ = lstsq(XtWX, XtWy) # small system The key insight is that ``Xt * weights`` is a row-broadcast (scale each row of ``Xt`` by the corresponding weight), which is a single NumPy ufunc call, not a loop. The resulting ``(n_expanded, n_expanded)`` matrix (e.g. 55×55) is tiny and the subsequent ``lstsq`` is negligible. This replaces the previous approach of computing the SVD of the full ``(n_pix × n_expanded)`` matrix at each iteration, yielding a **5–15× speed-up** for :math:`N_{\rm pix} \gtrsim 10^4`. **Weight update and clipping.** Only the first ``n_sys`` (linear) coefficients drive the weight update: .. code-block:: python weights = 1.0 / max(1.0 + alpha_hat_all[:n_sys] @ delta_t, 1e-6) weights = np.clip(weights, 1.0/20.0, 20.0) # hard bound Pixels where ``1 + a@t → 0`` would otherwise receive weights up to :math:`10^6`, making all subsequent weighted OLS fits numerically meaningless. The clip to :math:`[1/20, 20]` keeps the iteration stable. **Convergence criterion.** .. code-block:: python rel_change = ||w_new - w_old|| / (||w_old|| + 1e-30) if rel_change < tol (default 1e-5): break ISD-1 typically converges in :math:`\lesssim 25` iterations. ISD-3 with ``lambda_poly = 0`` may oscillate when templates are strongly correlated; setting ``lambda_poly = 1e-3 × Var(δ_g)`` resolves this. **Two-pass outlier handling** (in :func:`~sys_mapping.regression.run_decontamination`). After pass 1, pixels that hit the weight clip boundary are flagged as outliers: .. code-block:: python boundary = 20.0 * (1 - 1e-6) outlier_mask = (weights >= boundary) | (weights <= 1.0/boundary) If any outliers exist *and* there are at least :math:`\max(10 n_s, 50)` clean pixels, pass 2 refits ISD on the clean subset only, then applies the clean coefficients to all pixels (including outliers). This two-pass scheme prevents the handful of pathological pixels from biasing the overall fit. The ``isd_outlier_mask`` and ``isd_masked_fraction`` fields in the result dict report how many pixels were flagged. **Functions:** :func:`~sys_mapping.regression.iterative_systematics_decontamination`, :func:`~sys_mapping.regression.run_decontamination` ---- Null tests and SNR ranking --------------------------- **References:** `Ross et al. 2011 `_, Sec. 3.2 (null tests); `Tanidis et al. 2026 `_ (SNR ranking); `Weaverdyck & Huterer 2021 `_ (method comparison); `Rodríguez-Monroy et al. 2025 `_ (footprint masking). **Module:** :mod:`sys_mapping.diagnostics` **Null test.** After applying the systematic correction, the per-pixel weights :math:`w(p)` should be statistically uncorrelated with all template maps. If the correction is complete, the Pearson correlation: .. math:: r_i = \frac{\sum_p \bigl(w(p) - \bar{w}\bigr)\bigl(t_i(p) - \bar{t}_i\bigr)} {\sqrt{\sum_p(w(p)-\bar{w})^2}\,\sqrt{\sum_p(t_i(p)-\bar{t}_i)^2}} should satisfy :math:`|r_i| \approx 0` for all :math:`i`. A significant residual correlation indicates that template :math:`i` has not been adequately corrected. Permutation p-values are computed by shuffling :math:`w(p)` and recomputing :math:`r_i` a large number of times. **SNR ranking.** Three estimators rank templates by their contaminating power: * ``"template"`` — :math:`{\rm SNR}_i = |\hat\alpha_i| / \sigma_{\hat\alpha_i}`, where :math:`\sigma_{\hat\alpha_i}` is the OLS standard error. This is the standard frequentist significance of each template in the linear fit. * ``"data"`` — :math:`{\rm SNR}_i = |\mathrm{Corr}(\hat\delta_g, t_i)|`, the absolute Pearson correlation between the observed overdensity and template :math:`i`. Fast to compute; does not require fitting. * ``"peak"`` — the peak value of the cross pseudo-:math:`C_\ell` :math:`\hat{C}_\ell^{dt_i}` divided by the noise level. Identifies templates that contaminate on specific angular scales. **Footprint masking sensitivity.** Cutting the survey at different mask thresholds (progressively removing pixels with extreme template values) and measuring how the fitted amplitudes :math:`\hat\alpha_i` change is a powerful sanity check: genuinely contaminated fields should show stable or decreasing amplitudes as the worst-affected pixels are removed (Rodríguez-Monroy+2025). **Technical implementation.** *Null test* (:func:`~sys_mapping.diagnostics.null_test_cross_correlations`): the Pearson correlation is computed in a numerically stable form using centred vectors: .. code-block:: python w_centered = weights - weights.mean() w_norm = sqrt(sum(w_centered**2)) for i, t_i in enumerate(delta_t): t_centered = t_i - t_i.mean() t_norm = sqrt(sum(t_centered**2)) r[i] = sum(w_centered * t_centered) / (w_norm * t_norm) Division by zero is guarded: if either norm is ``< 1e-30``, ``r[i] = 0.0`` is returned rather than raising an exception. Permutation p-values shuffle only the weight vector (not the templates), preserving the spatial structure of the templates: .. code-block:: python for _ in range(n_bootstrap): w_shuffled = rng.permutation(w_centered) r_perm = abs(sum(w_shuffled * t_centered) / (w_norm * t_norm)) if r_perm >= |r_obs|: count_extreme += 1 p_value = (count_extreme + 1) / (n_bootstrap + 1) # Laplace smoothing The ``+1`` in numerator and denominator is the standard conservative correction that ensures ``p_value > 0`` even when no permutation exceeds the observed statistic. *SNR ranking* (:func:`~sys_mapping.diagnostics.snr_template_ranking`): * **``"template"``** — fits OLS on the full template matrix ``X = delta_t.T``, estimates residual variance :math:`\hat\sigma^2 = \text{RSS}/(N-n_s)`, and computes the OLS t-statistic per parameter: .. code-block:: python alpha_hat, *_ = lstsq(X, delta_g_obs) sigma2 = sum(residual**2) / max(n_pix - n_sys, 1) XtX_inv = pinv(X.T @ X) param_var = sigma2 * diag(XtX_inv) snr = |alpha_hat| / sqrt(maximum(param_var, 1e-30)) Uses ``np.linalg.pinv`` for numerical stability when templates are nearly collinear. * **``"data"``** — computes ``|Corr(delta_g, t_i)|`` in a loop; cost :math:`O(n_s \times N)`. * **``"peak"``** — calls ``hp.anafast`` twice per template (once for the galaxy map, once cross with each template), then normalises by the median power of the galaxy auto-spectrum as a noise-level proxy: .. code-block:: python cl_g = hp.anafast(delta_g_obs, lmax=lmax) noise_level = median(|cl_g|) + 1e-30 for t_i in delta_t: cl_cross = hp.anafast(delta_g_obs, map2=t_i, lmax=lmax) snr[i] = max(|cl_cross|) / noise_level *Footprint masking diagnostics* (:func:`~sys_mapping.diagnostics.footprint_mask_diagnostics`): pixels are ranked by their RMS template value: .. code-block:: python rms_per_pixel = sqrt(mean(delta_t**2, axis=0)) # (n_pix,): ∑ t²_i / n_s rank_order = argsort(rms_per_pixel) # ascending For each masking level the lowest-``rms`` pixels are dropped first, ensuring that the surviving pixels have progressively higher systematic leverage. OLS is refitted on the surviving pixels at each level, and the amplitude scatter across levels quantifies sensitivity to the masking threshold: .. code-block:: python alpha_all[k], *_ = lstsq(delta_t[:, keep].T, delta_g_obs[keep]) scatter = std(alpha_all, axis=0) # (n_sys,): near zero → robust **Functions:** :func:`~sys_mapping.diagnostics.null_test_cross_correlations`, :func:`~sys_mapping.diagnostics.snr_template_ranking`, :func:`~sys_mapping.diagnostics.footprint_mask_diagnostics` ---- Mock catalog generation ------------------------ **References:** `Berlfein et al. 2024 `_ (validation strategy); `Coles & Jones 1991 `_ (lognormal field model); `Weaverdyck & Huterer 2021 `_ (mock injection tests). **Module:** :mod:`sys_mapping.mocks` **Physical motivation.** Validating a systematic decontamination method requires knowing the true underlying galaxy field and the true contamination amplitudes. Synthetic (mock) catalogs are generated with a known ground truth, so parameter recovery and decontamination quality can be measured exactly — something impossible with real data. **Lognormal galaxy field.** The true galaxy overdensity is: .. math:: \delta_g^{\rm true}(p) = e^{G(p) - \sigma^2/2} - 1 where :math:`G(p)` is a zero-mean Gaussian random field drawn from a power spectrum :math:`C_\ell^G \propto (\ell+1)^{-2}`, normalised so that the pixel-to-pixel variance equals :math:`\sigma^2`. This form ensures: * :math:`\delta_g^{\rm true}(p) > -1` everywhere (physically: number density :math:`n_g \propto 1 + \delta_g > 0`). * :math:`\mathbb{E}[\delta_g] = 0` by the normalisation of the exponential. * The distribution is positively skewed (characteristic of the real galaxy distribution). **Poisson sampling.** Galaxy counts are drawn as: .. math:: n_g(p) \sim {\rm Poisson}\!\bigl[\bar{n}\,(1 + \hat\delta_g(p))\bigr] where :math:`\bar{n}` is the mean galaxy count per pixel (set by the ``n_mean`` parameter) and :math:`\hat\delta_g(p)` is the contaminated overdensity. A random catalog of :math:`N_r = 8\bar{n}N` objects is generated uniformly over the unmasked footprint. **Technical implementation.** *Lognormal field generation* (:func:`~sys_mapping.mocks.generate_lognormal_field`): a Gaussian random field :math:`G(p)` is drawn with ``healpy.synfast`` from a power-law angular power spectrum :math:`C_\ell^G \propto (\ell+1)^{-2}`, normalised so that :math:`\mathrm{Var}(G) = \sigma^2`: .. code-block:: python ell = np.arange(lmax + 1) cl_g = (ell + 1.0)**(-2) cl_g /= cl_g.sum() # normalise to unit variance cl_g *= sigma**2 # scale to target variance G = hp.synfast(cl_g, nside=nside, lmax=lmax) delta_g_true = np.exp(G - sigma**2 / 2.0) - 1.0 # ensures E[δ] = 0 The :math:`-\sigma^2/2` shift is the standard log-normal mean correction that ensures :math:`\mathbb{E}[\delta_g] = 0` exactly (when :math:`G \sim \mathcal{N}(0, \sigma^2)` the expectation of :math:`e^G` is :math:`e^{\sigma^2/2}`). *Contamination injection*: the true overdensity is contaminated using :func:`~sys_mapping.contamination.apply_contamination` with the known ground-truth amplitudes ``(a_true, b_true)``, so that parameter recovery tests compare inferred amplitudes directly against the injected values. *Poisson sampling* (:func:`~sys_mapping.mocks.make_mock_catalog`): .. code-block:: python n_g = rng.poisson(n_mean * (1.0 + delta_g_obs)) # (n_pix,) integer counts # Random catalog: N_r = 8 * n_mean * n_pix objects uniformly over footprint N_r = int(8 * n_mean * good_pixels.sum()) The ratio :math:`N_r / N_g \approx 8` ensures that the random catalog shot noise contributes negligibly to the Landy–Szalay estimator variance (Landy & Szalay 1993 recommend :math:`N_r / N_g \geq 5`). **Functions:** :func:`~sys_mapping.mocks.generate_lognormal_field`, :func:`~sys_mapping.mocks.make_galactic_mask`, :func:`~sys_mapping.mocks.make_mock_catalog`, :func:`~sys_mapping.mocks.make_mock_suite`, :func:`~sys_mapping.mocks.MockCatalog` ---- HEALPix map utilities ---------------------- **Module:** :mod:`sys_mapping.maps` `HEALPix `_ (Hierarchical Equal Area isoLatitude Pixelisation) divides the sphere into :math:`N_{\rm pix} = 12\,{\rm NSIDE}^2` equal-area pixels. Common NSIDE values and their properties: .. list-table:: :header-rows: 1 :widths: 15 25 30 30 * - NSIDE - :math:`N_{\rm pix}` - Pixel area (deg²) - Angular resolution (arcmin) * - 32 - 12 288 - 3.36 - ≈ 110 * - 64 - 49 152 - 0.84 - ≈ 55 * - 128 - 196 608 - 0.21 - ≈ 27 * - 512 - 3 145 728 - 0.013 - ≈ 7 For LS10 BGS analyses NSIDE = 64–128 is used. **Workflow** to go from catalogs to overdensity and template arrays: 1. :func:`~sys_mapping.maps.pixelize_catalog` — bin RA/Dec positions into a HEALPix count map. 2. :func:`~sys_mapping.maps.compute_overdensity` — form :math:`\hat\delta_g(p) = n_g(p)/(f_r n_r(p)) - 1` and return the boolean unmasked-pixel array. 3. :func:`~sys_mapping.maps.assign_template_values` — extract template values at unmasked pixels, returning a ``(n_sys, n_good_pix)`` array. **Technical implementation.** The three-step catalog-to-overdensity pipeline executes as follows: *Step 1 — pixelise* (:func:`~sys_mapping.maps.pixelize_catalog`): .. code-block:: python pix_ids = hp.ang2pix(nside, ra, dec, lonlat=True) # RA/Dec → HEALPix ring index count_map = np.bincount(pix_ids, minlength=12*nside**2).astype(float) ``hp.ang2pix`` with ``lonlat=True`` accepts RA/Dec in degrees. The result is a ``(12×NSIDE²,)`` integer count map covering the full sky (zeros outside the footprint). The bincount is :math:`O(N_{\rm gal})`. *Step 2 — overdensity* (:func:`~sys_mapping.maps.compute_overdensity`): .. code-block:: python f_r = n_gal.sum() / n_rand.sum() # galaxy-to-random normalisation good_pixels = (n_rand > 0) & mask # boolean, (n_pix,) delta_g_full = np.zeros(n_pix) delta_g_full[good_pixels] = ( n_gal[good_pixels] / (f_r * n_rand[good_pixels]) - 1.0) The normalisation factor :math:`f_r = N_g / N_r` accounts for the different total sizes of the galaxy and random catalogs. Pixels with zero random count are masked to avoid division by zero. *Step 3 — template extraction* (:func:`~sys_mapping.maps.assign_template_values`): .. code-block:: python delta_t = template_full_sky[:, good_pixels] # (n_sys, n_good_pix) # Centre and normalise each template to zero mean, unit variance delta_t -= delta_t.mean(axis=1, keepdims=True) delta_t /= delta_t.std(axis=1, keepdims=True) The centring and normalisation ensure that template amplitudes :math:`a_i, b_i` are dimensionless and comparable across templates of different physical origins. **Functions:** :func:`~sys_mapping.maps.pixelize_catalog`, :func:`~sys_mapping.maps.compute_overdensity`, :func:`~sys_mapping.maps.assign_template_values`, :func:`~sys_mapping.maps.generate_systematic_map`, :func:`~sys_mapping.maps.generate_systematic_maps`, :func:`~sys_mapping.maps.systematic_power_spectrum` ---- Two-point function measurement -------------------------------- **Module:** :mod:`sys_mapping.utils` The angular two-point correlation function :math:`w(\theta)` is measured using the **Landy-Szalay estimator**: .. math:: w(\theta) = \frac{DD(\theta) - 2\,DR(\theta) + RR(\theta)}{RR(\theta)} where :math:`DD(\theta)`, :math:`DR(\theta)`, :math:`RR(\theta)` are the normalised pair counts in angular separation bin :math:`\theta` for galaxy-galaxy, galaxy-random, and random-random pairs respectively. The estimator is unbiased for large random catalogs and has minimum variance for a Poisson-distributed galaxy field (Landy & Szalay 1993, ApJ 412, 64). ``sys_mapping`` provides wrappers for `TreeCorr `_ (which uses a ball-tree for efficient pair counting, :math:`\mathcal{O}(N\log N)`) and for `Corrfunc `_ (highly optimised C implementation for large catalogs). **Technical implementation.** *TreeCorr backend* (:func:`~sys_mapping.utils.measure_two_point_function`): wraps ``treecorr.NNCorrelation`` with a ball-tree pair counter. The angular bins are specified in log-space (``min_sep``, ``max_sep`` in degrees, ``nbins`` bins), and the catalog is passed once for DD and once for DR: .. code-block:: python cat_g = treecorr.Catalog(ra=ra_g, dec=dec_g, ra_units="deg", dec_units="deg") cat_r = treecorr.Catalog(ra=ra_r, dec=dec_r, ra_units="deg", dec_units="deg") nn = treecorr.NNCorrelation(min_sep=min_sep, max_sep=max_sep, nbins=nbins, sep_units="deg") nn.process(cat_g) # DD counts nr = nn.copy(); nr.process(cat_g, cat_r) # DR counts rr = nn.copy(); rr.process(cat_r) # RR counts xi, varxi = nn.calculateXi(rr=rr, dr=nr) # Landy–Szalay TreeCorr's ball-tree algorithm has :math:`O(N \log N)` pair-count complexity; it applies the Landy–Szalay formula internally. *Corrfunc backend* (:func:`~sys_mapping.utils.measure_two_point_function_corrfunc`): calls ``Corrfunc.mocks.DDtheta_mocks`` directly on RA/Dec arrays. Corrfunc is a highly optimised C library that vectorises the inner pair loop with AVX2 SIMD instructions and outperforms TreeCorr by 2–5× for :math:`N \gtrsim 10^5`: .. code-block:: python DD = Corrfunc.mocks.DDtheta_mocks(autocorr=True, nthreads=n_threads, binfile=theta_bins, RA=ra_g, DEC=dec_g) DR = Corrfunc.mocks.DDtheta_mocks(autocorr=False, nthreads=n_threads, binfile=theta_bins, RA1=ra_g, DEC1=dec_g, RA2=ra_r, DEC2=dec_r) RR = Corrfunc.mocks.DDtheta_mocks(autocorr=True, nthreads=n_threads, binfile=theta_bins, RA=ra_r, DEC=dec_r) w_theta = (DD - 2*DR + RR) / RR # Landy–Szalay (normalised pair counts) Both backends return the same ``(n_bins,)`` array ``w_theta`` and are interchangeable. The Corrfunc backend should be preferred for production runs with :math:`N > 10^6` galaxies. **Functions:** :func:`~sys_mapping.utils.measure_two_point_function`, :func:`~sys_mapping.utils.measure_two_point_function_corrfunc`, :func:`~sys_mapping.utils.measure_kk_correlation_treecorr`, :func:`~sys_mapping.utils.measure_kk_correlation_corrfunc`, :func:`~sys_mapping.utils.compute_covariance_matrix`, :func:`~sys_mapping.utils.compute_amplitude_bias` ---- Method comparison table ------------------------ .. list-table:: :header-rows: 1 :widths: 30 28 14 28 * - Method - Reference(s) - Status - Module * - Additive contamination model (MCMC) - Berlfein+2024 - Implemented - ``likelihood``, ``inference`` * - Multiplicative contamination model (MCMC) - Berlfein+2024 - Implemented - ``likelihood``, ``inference`` * - Combined contamination model (MCMC) - Berlfein+2024 - Implemented - ``likelihood``, ``inference`` * - PCA template rotation - Berlfein+2024 - Implemented - ``correction`` * - Noise debiasing - Berlfein+2024 - Implemented - ``correction`` * - Two-point function correction - Berlfein+2024 - Implemented - ``contamination``, ``correction`` * - Likelihood ratio model selection - Berlfein+2024 - Implemented - ``model_selection`` * - Block bootstrap covariance - Berlfein+2024 - Implemented - ``bootstrap`` * - Jack-knife covariance - Ross+2011, Ho+2012 - Implemented - ``bootstrap`` * - Null test cross-correlations - Ross+2011 - Implemented - ``diagnostics`` * - Pseudo-:math:`C_\ell` estimator - Elsner+2016, Ho+2012 - Implemented - ``power_spectrum`` * - Harmonic template subtraction - Elsner+2016 - Implemented - ``power_spectrum`` * - Harmonic bias correction - Elsner+2016 - Implemented - ``power_spectrum`` * - Basic / Extended mode projection (BMP/EMP) - Elsner+2016, Leistedt+2014 - Implemented - ``power_spectrum`` * - Harmonic power spectrum correction pipeline - Elsner+2016, Elsner+2017 - Implemented - ``correction`` * - ElasticNet regression - Weaverdyck+2021 - Implemented - ``regression`` * - Iterative Systematics Decontamination (ISD) - Rodríguez-Monroy+2025 - Implemented - ``regression`` * - Multi-method comparison framework - Weaverdyck+2021 - Implemented - ``regression`` * - SNR-based template ranking - Tanidis+2026 - Implemented - ``diagnostics`` * - Footprint masking diagnostics - Rodríguez-Monroy+2025 - Implemented - ``diagnostics`` * - NaMaster coupling matrix (exact) - Alonso+2019, Elsner+2017 - Optional backend (planned) - ``power_spectrum`` * - Catalogue-based PCL deprojection - Cornish+2026 - Deferred - ``deprojection`` (future) * - Transfer function calibration - Cornish+2026 - Deferred - ``deprojection`` (future) * - ANN / deep-learning weights - Rezaie+2020 - Not planned - External pre-processing * - Optimal quadratic estimator (QMV) - Ho+2012 - Not planned - External (NaMaster) * - Shear E/B quadratic estimator - Tanidis+2026 - Not planned - Weak-lensing specific