Testing

sys_mapping has a comprehensive test suite in the tests/ directory. Tests are organised by module and cover correctness, numerical accuracy, and performance.

Running the tests

Install the package with the development extras first:

pip install -e ".[dev]"

Then run the full suite:

pytest tests/ -v

Run a single module:

pytest tests/test_contamination.py -v

Run tests that require scikit-learn (skipped otherwise):

pip install scikit-learn
pytest tests/test_regression.py -v

Run performance benchmarks (slow; uses --benchmark marker):

pytest tests/test_benchmarks.py -v

Test modules

File	Tests	Coverage
`test_contamination.py`	15	`contamination` — parameter layout, pack/unpack round-trip, forward model, two-point correction formula
`test_likelihood.py`	7	`likelihood` — Gaussian and skew-normal log-likelihoods, JAX gradient correctness via finite differences
`test_maps.py`	32	`maps` — power spectrum generation, synthetic map generation, catalog pixelisation, overdensity computation; plus `load_real_template` / `load_real_templates` (5 synthetic FITS fixture tests + 7 real-data tests guarded by `@real_data`)
`test_correction.py`	12	`correction` — noise debiasing, PCA rotation, round-trip param transform, two-point correction
`test_model_selection.py`	7	`model_selection` — LRT statistic, chi-squared distribution under the null, p-value monotonicity
`test_accuracy.py`	34	End-to-end accuracy: contamination round-trip errors < 1e-10; overdensity mean near zero; likelihood gradient norm; correction bias below tolerance; amplitude bias estimator
`test_power_spectrum.py`	19	`power_spectrum` — harmonic bias formula \(b_\ell = -n/(2\ell+1)\), pseudo-Cℓ white noise validation, template subtraction identity (alpha=0), BMP/EMP mode projection
`test_regression.py`	26	`regression` — weight bounds, ElasticNet recovery (skipped without scikit-learn), ISD convergence and polynomial order reduction, method comparison output keys; `combined_mcmc` keys and default-method assertion.
`test_diagnostics.py`	19	`diagnostics` — null test on uncorrelated fields, SNR ranking shape and ordering, footprint masking output consistency
`test_mocks.py`	40	`mocks` — lognormal field properties (skewness, positivity, reproducibility), galactic mask geometry, mock catalog shapes and physics, suite scenarios, pipeline integration
`test_benchmarks.py`	19	Execution time regression: contamination, maps, likelihood, correction, and utils must complete within set thresholds
`test_timing.py`	240	Wall-clock scaling of all six methods as a function of NSIDE (8, 16, 32, 64) and n_templates (1–10); 40 configs × 4 NSIDE values = 160 tests per fast method; MCMC tests marked `@pytest.mark.slow` (skip with `-m "not slow"`)
`test_real_templates.py`	28	Integration tests using real GAIA and LS DR10 HEALPix maps as systematic templates (synth_5, synth_6); all six methods exercised on a synthetic mock built within the LS10 survey footprint (NSIDE = 32, ~5 954 valid pixels). Tests skipped automatically if FITS files are absent. See Real-template integration tests below.

Test design philosophy

Unit tests check individual functions in isolation:

Input/output shapes are always verified.
Round-trip identities are tested (e.g. apply_contamination ∘ invert_contamination = identity).
Known closed-form results are tested to machine precision where applicable (e.g. the harmonic bias formula, pack/unpack).

Accuracy tests (test_accuracy.py) verify that numerical errors remain below physically meaningful thresholds across a range of NSIDE values, template counts, and noise levels.

Integration tests (TestMockPipelineIntegration in test_mocks.py) run the full pipeline from raw catalogs to overdensity estimates and confirm that OLS recovers injected additive amplitudes to within 50 % at low signal-to-noise (NSIDE = 16, 100 galaxies / pixel).

Skipped tests — test_real_templates.py tests are skipped when the GAIA and LS10 FITS files are absent from ~/data/legacysurvey/dr10/systematics/. ElasticNet tests in test_regression.py are skipped when scikit-learn is not installed.

Continuous integration

The test suite is designed to run in the sys_map conda environment:

conda activate sys_map
pytest tests/ -v --tb=short

Expected output with scikit-learn and real data files present:

258 passed in ~34 s

Without real data files (no GAIA/LS10 FITS):

230 passed in ~21 s

Systematic test matrix

Beyond the unit and integration test suite, a separate systematic test matrix is available in scripts/run_systematic_tests.py. This script is not run as part of pytest; it is a standalone benchmark that exhaustively evaluates all six methods across 32 contamination configurations (Tier 1: additive-only and multiplicative-only with 1–7 templates; Tier 2: mixed additive+multiplicative with varying numbers of multiplicative templates).

The key metric is \(\sigma[(1+\delta_g^{\rm corr})/(1+\delta_g^{\rm true})]\), the standard deviation of the pixel-level correction ratio. Results and figures are documented in Model test matrix.

To run:

conda activate sys_map
python scripts/run_systematic_tests.py \\
    --nside 32 --n-walkers 64 --n-steps 200 --n-burn 50 \\
    --output-dir results/systematic_tests/

Expected runtime: ~8 minutes (NSIDE = 32, all 6 methods, 32 configurations).

After re-running with the updated script, the CSV will contain a time_s column per method-configuration row, and two timing figures are written to results/systematic_tests/:

timing_vs_ntemplates.png — wall-clock time vs. n_templates per method (log scale)
timing_mean_per_method.png — mean compute time per method across all 32 configs

Timing scaling tests

tests/test_timing.py parametrises all six methods over NSIDE ∈ {8, 16, 32, 64} and n_templates ∈ {1, …, 10}, measuring actual wall-clock time for each (NSIDE, n_templates) pair. MCMC tests are marked @pytest.mark.slow and can be skipped:

pytest tests/test_timing.py -v -s -m "not slow"   # OLS / ElasticNet / ISD only
pytest tests/test_timing.py -v -s                  # all methods (slow)

The -s flag is required to see the per-test timing printed to stdout. Each test asserts that the method completes within a generous wall-clock bound (30 s for OLS/ISD, 120 s for ElasticNet/ISD-3, 600 s for MCMC), so the suite fails only if a method becomes catastrophically slow. Use the stdout output for the actual scaling data.

Expected test count: 40 (NSIDE × n_templates) per method class × 6 classes = 240 tests (excluding MCMC slow tests from the default run: 160 fast tests).

Adding new tests

Follow the existing pattern:

Create a test class per public function group (e.g. class TestMyFunction).
Use @pytest.fixture(scope="class") for expensive shared objects.
Test shapes, dtype, value ranges, and known analytic limits.
For stochastic tests, fix the seed via seed= or np.random.default_rng(N).
Add the new file to this page.

Real-template integration tests

tests/test_real_templates.py tests every implemented method end-to-end on a synthetic galaxy mock built using real observational systematic maps from GAIA DR3 and the Legacy Survey DR10. These tests exercise the entire pipeline — loading, normalisation, injection, inference, and model selection — with physically realistic templates rather than toy random fields.

Template set

Label	Source	NSIDE (tests)	Physical interpretation
`synth_0`	Synthetic, family 0 (\(C_\ell \propto e^{-\ell/500}\))	32	Large-scale coherent artefact (e.g. zodiacal light)
`synth_1`	Synthetic, family 1 (\(C_\ell \propto e^{-(\ell/250)^2}\))	32	Intermediate-scale artefact (e.g. airglow)
synth_5	GAIA DR3 — `GAIA_nstar_faint_NSIDE_00032.fits` (`nstar_faint` column)	32	Faint-star surface density — proxy for stellar contamination (misclassified stars inflate galaxy counts)
synth_6	LS DR10 — `LS10_GALDEPTH_Z_NSIDE_0032.fits` (`GALDEPTH_Z` column)	32	Galaxy depth in the z band — proxy for selection-depth variations (depth modulates survey completeness multiplicatively)

Survey footprint: the LS10 depth valid mask (pixels where GALDEPTH_Z > 0) restricts the mock to 5 954 pixels (48.4 % of the NSIDE = 32 sphere). GAIA is a full-sky map (12 288 / 12 288 valid pixels); its values inside the LS10 footprint are used.

Mock configuration and injected parameters

Parameter	Value
NSIDE	32 (pixel area ≈ 3.4 deg², 12 288 pixels total)
Survey footprint pixels	5 954 (LS10 depth mask)
Templates \(n_s\)	4 (synth_0, synth_1, GAIA_nstar_faint, LS10_GALDEPTH_Z)
\(a_i^{\rm true}\) (additive)	\((0.08,\ {-0.05},\ 0.06,\ {-0.04})\)
\(b_i^{\rm true}\) (multiplicative)	\((0.04,\ 0.00,\ {-0.03},\ 0.05)\)
Mean galaxies per pixel \(\bar n\)	50
Random / galaxy ratio	8×
Seed	7

The galaxy overdensity follows a lognormal field \(\delta_g^{\rm true} = e^{G - \sigma_G^2/2} - 1\) with \(C_\ell^G \propto (\ell+1)^{-2}\), \(\sigma_G = 0.5\). Note that template 1 (synth_1) has \(b_1^{\rm true} = 0\) (purely additive) while templates 0, 2, 3 have non-zero multiplicative amplitudes — a realistic mixed-contamination scenario.

Method results

Each of the six implemented methods is run once on this fixed mock:

Method	Mean \(\|\hat a_i - a_i^{\rm true}\|\)	Tolerance	Notes
OLS	< 0.20	0.20	Ordinary least-squares pixel regression; fastest method
ElasticNet	< 0.25	0.25	Cross-validated (3 folds); requires `scikit-learn ≥ 1.3`
ISD-1 (poly_order = 1)	< 0.25	0.25	Converges in < 50 iterations
ISD-3 (poly_order = 3)	n/a (numerically unstable)	finite values only	Polynomial expansion produces 34 features for \(n_s = 4\), \(n_{\rm pix}/n_{\rm feat} \approx 175\); ill-conditioned with real correlated templates. Check finiteness only.
MCMC-additive	< 0.25	0.25	Chain shape \((n_w \times 160,\; n_s + 1)\) with \(n_w \geq 2(n_s+1)+2 = 12\); \(n_{\rm dim} = 5\)
MCMC-combined (Berlfein+2024)	< 0.30 for \(\hat a_i\); < 0.30 for \(\hat b_i\)	0.30	Chain shape \((n_w \times 160,\; 2n_s + 1)\); \(n_{\rm dim} = 9\). Positive posterior variances confirmed.

Model selection and diagnostics

LRT (\(H_0\): additive, \(H_1\): combined) — the additive null is rejected at the 5 % level because \(b_0, b_2, b_3 \neq 0\). The test statistic \(\lambda_{\rm LR} = 2[\ln\mathcal{L}_1 - \ln\mathcal{L}_0]\) is positive and \(p < 0.05\).
Null test — the maximum Pearson correlation \(\max_i |r_i|\) between the OLS-corrected weights and the template maps satisfies \(\max|r_i| < 0.50\), confirming partial residual removal.
SNR ranking — the SNR array has shape \((4,)\), all entries are \(\geq 0\), and at least one entry is \(> 0.01\), demonstrating that the real GAIA and LS10 maps carry detectable systematic signal.

Running the real-template tests

Ensure the FITS files are present at their default paths (see load_real_templates()), then:

conda activate sys_map
pytest tests/test_real_templates.py -v

Expected output:

28 passed in ~62 s

To skip when files are absent, they degrade gracefully:

28 skipped (reason: GAIA/LS10 FITS files not found)

Test outcome

The full test suite was executed in the sys_map conda environment (Python 3.11, JAX 64-bit, scikit-learn ≥ 1.3, real GAIA and LS10 FITS files present) against the current codebase.

Results:

258 passed, 0 failed, 0 errors (excluding test_timing.py)
test_real_templates.py — 28 passed using real GAIA DR3 and LS10 DR10 systematic maps; all six methods completed without error on the 5 954-pixel LS10 footprint.
test_regression.py — 26 passed, including the previously known edge case (TestElasticNet::test_weights_bounded_positive), which now passes.
The systematic test matrix (scripts/run_systematic_tests.py, 32 configurations) ran to completion; results are documented in Model test matrix.

Status: PASSED — all 258 tests pass with no known failures.