Testing

sys_mapping has a comprehensive test suite in the tests/ directory. Tests are organised by module and cover correctness, numerical accuracy, and performance.

Running the tests

Install the package with the development extras first:

pip install -e ".[dev]"

Then run the full suite:

pytest tests/ -v

Run a single module:

pytest tests/test_contamination.py -v

Run tests that require scikit-learn (skipped otherwise):

pip install scikit-learn
pytest tests/test_regression.py -v

Run performance benchmarks (slow; uses --benchmark marker):

pytest tests/test_benchmarks.py -v

Test modules

File

Tests

Coverage

test_contamination.py

15

contamination — parameter layout, pack/unpack round-trip, forward model, two-point correction formula

test_likelihood.py

7

likelihood — Gaussian and skew-normal log-likelihoods, JAX gradient correctness via finite differences

test_maps.py

32

maps — power spectrum generation, synthetic map generation, catalog pixelisation, overdensity computation; plus load_real_template / load_real_templates (5 synthetic FITS fixture tests + 7 real-data tests guarded by @real_data)

test_correction.py

12

correction — noise debiasing, PCA rotation, round-trip param transform, two-point correction

test_model_selection.py

7

model_selection — LRT statistic, chi-squared distribution under the null, p-value monotonicity

test_accuracy.py

34

End-to-end accuracy: contamination round-trip errors < 1e-10; overdensity mean near zero; likelihood gradient norm; correction bias below tolerance; amplitude bias estimator

test_power_spectrum.py

19

power_spectrum — harmonic bias formula \(b_\ell = -n/(2\ell+1)\), pseudo-Cℓ white noise validation, template subtraction identity (alpha=0), BMP/EMP mode projection

test_regression.py

26

regression — weight bounds, ElasticNet recovery (skipped without scikit-learn), ISD convergence and polynomial order reduction, method comparison output keys; combined_mcmc keys and default-method assertion.

test_diagnostics.py

19

diagnostics — null test on uncorrelated fields, SNR ranking shape and ordering, footprint masking output consistency

test_mocks.py

40

mocks — lognormal field properties (skewness, positivity, reproducibility), galactic mask geometry, mock catalog shapes and physics, suite scenarios, pipeline integration

test_benchmarks.py

19

Execution time regression: contamination, maps, likelihood, correction, and utils must complete within set thresholds

test_timing.py

240

Wall-clock scaling of all six methods as a function of NSIDE (8, 16, 32, 64) and n_templates (1–10); 40 configs × 4 NSIDE values = 160 tests per fast method; MCMC tests marked @pytest.mark.slow (skip with -m "not slow")

test_real_templates.py

28

Integration tests using real GAIA and LS DR10 HEALPix maps as systematic templates (synth_5, synth_6); all six methods exercised on a synthetic mock built within the LS10 survey footprint (NSIDE = 32, ~5 954 valid pixels). Tests skipped automatically if FITS files are absent. See Real-template integration tests below.


Test design philosophy

Unit tests check individual functions in isolation:

  • Input/output shapes are always verified.

  • Round-trip identities are tested (e.g. apply_contaminationinvert_contamination = identity).

  • Known closed-form results are tested to machine precision where applicable (e.g. the harmonic bias formula, pack/unpack).

Accuracy tests (test_accuracy.py) verify that numerical errors remain below physically meaningful thresholds across a range of NSIDE values, template counts, and noise levels.

Integration tests (TestMockPipelineIntegration in test_mocks.py) run the full pipeline from raw catalogs to overdensity estimates and confirm that OLS recovers injected additive amplitudes to within 50 % at low signal-to-noise (NSIDE = 16, 100 galaxies / pixel).

Skipped teststest_real_templates.py tests are skipped when the GAIA and LS10 FITS files are absent from ~/data/legacysurvey/dr10/systematics/. ElasticNet tests in test_regression.py are skipped when scikit-learn is not installed.


Continuous integration

The test suite is designed to run in the sys_map conda environment:

conda activate sys_map
pytest tests/ -v --tb=short

Expected output with scikit-learn and real data files present:

258 passed in ~34 s

Without real data files (no GAIA/LS10 FITS):

230 passed in ~21 s

Systematic test matrix

Beyond the unit and integration test suite, a separate systematic test matrix is available in scripts/run_systematic_tests.py. This script is not run as part of pytest; it is a standalone benchmark that exhaustively evaluates all six methods across 32 contamination configurations (Tier 1: additive-only and multiplicative-only with 1–7 templates; Tier 2: mixed additive+multiplicative with varying numbers of multiplicative templates).

The key metric is \(\sigma[(1+\delta_g^{\rm corr})/(1+\delta_g^{\rm true})]\), the standard deviation of the pixel-level correction ratio. Results and figures are documented in Model test matrix.

To run:

conda activate sys_map
python scripts/run_systematic_tests.py \\
    --nside 32 --n-walkers 64 --n-steps 200 --n-burn 50 \\
    --output-dir results/systematic_tests/

Expected runtime: ~8 minutes (NSIDE = 32, all 6 methods, 32 configurations).

After re-running with the updated script, the CSV will contain a time_s column per method-configuration row, and two timing figures are written to results/systematic_tests/:

  • timing_vs_ntemplates.png — wall-clock time vs. n_templates per method (log scale)

  • timing_mean_per_method.png — mean compute time per method across all 32 configs


Timing scaling tests

tests/test_timing.py parametrises all six methods over NSIDE ∈ {8, 16, 32, 64} and n_templates ∈ {1, …, 10}, measuring actual wall-clock time for each (NSIDE, n_templates) pair. MCMC tests are marked @pytest.mark.slow and can be skipped:

pytest tests/test_timing.py -v -s -m "not slow"   # OLS / ElasticNet / ISD only
pytest tests/test_timing.py -v -s                  # all methods (slow)

The -s flag is required to see the per-test timing printed to stdout. Each test asserts that the method completes within a generous wall-clock bound (30 s for OLS/ISD, 120 s for ElasticNet/ISD-3, 600 s for MCMC), so the suite fails only if a method becomes catastrophically slow. Use the stdout output for the actual scaling data.

Expected test count: 40 (NSIDE × n_templates) per method class × 6 classes = 240 tests (excluding MCMC slow tests from the default run: 160 fast tests).


Adding new tests

Follow the existing pattern:

  1. Create a test class per public function group (e.g. class TestMyFunction).

  2. Use @pytest.fixture(scope="class") for expensive shared objects.

  3. Test shapes, dtype, value ranges, and known analytic limits.

  4. For stochastic tests, fix the seed via seed= or np.random.default_rng(N).

  5. Add the new file to this page.


Real-template integration tests

tests/test_real_templates.py tests every implemented method end-to-end on a synthetic galaxy mock built using real observational systematic maps from GAIA DR3 and the Legacy Survey DR10. These tests exercise the entire pipeline — loading, normalisation, injection, inference, and model selection — with physically realistic templates rather than toy random fields.

Template set

Label

Source

NSIDE (tests)

Physical interpretation

synth_0

Synthetic, family 0 (\(C_\ell \propto e^{-\ell/500}\))

32

Large-scale coherent artefact (e.g. zodiacal light)

synth_1

Synthetic, family 1 (\(C_\ell \propto e^{-(\ell/250)^2}\))

32

Intermediate-scale artefact (e.g. airglow)

synth_5

GAIA DR3GAIA_nstar_faint_NSIDE_00032.fits (nstar_faint column)

32

Faint-star surface density — proxy for stellar contamination (misclassified stars inflate galaxy counts)

synth_6

LS DR10LS10_GALDEPTH_Z_NSIDE_0032.fits (GALDEPTH_Z column)

32

Galaxy depth in the z band — proxy for selection-depth variations (depth modulates survey completeness multiplicatively)

Survey footprint: the LS10 depth valid mask (pixels where GALDEPTH_Z > 0) restricts the mock to 5 954 pixels (48.4 % of the NSIDE = 32 sphere). GAIA is a full-sky map (12 288 / 12 288 valid pixels); its values inside the LS10 footprint are used.

Mock configuration and injected parameters

Parameter

Value

NSIDE

32 (pixel area ≈ 3.4 deg², 12 288 pixels total)

Survey footprint pixels

5 954 (LS10 depth mask)

Templates \(n_s\)

4 (synth_0, synth_1, GAIA_nstar_faint, LS10_GALDEPTH_Z)

\(a_i^{\rm true}\) (additive)

\((0.08,\ {-0.05},\ 0.06,\ {-0.04})\)

\(b_i^{\rm true}\) (multiplicative)

\((0.04,\ 0.00,\ {-0.03},\ 0.05)\)

Mean galaxies per pixel \(\bar n\)

50

Random / galaxy ratio

Seed

7

The galaxy overdensity follows a lognormal field \(\delta_g^{\rm true} = e^{G - \sigma_G^2/2} - 1\) with \(C_\ell^G \propto (\ell+1)^{-2}\), \(\sigma_G = 0.5\). Note that template 1 (synth_1) has \(b_1^{\rm true} = 0\) (purely additive) while templates 0, 2, 3 have non-zero multiplicative amplitudes — a realistic mixed-contamination scenario.

Method results

Each of the six implemented methods is run once on this fixed mock:

Method

Mean \(|\hat a_i - a_i^{\rm true}|\)

Tolerance

Notes

OLS

< 0.20

0.20

Ordinary least-squares pixel regression; fastest method

ElasticNet

< 0.25

0.25

Cross-validated (3 folds); requires scikit-learn 1.3

ISD-1 (poly_order = 1)

< 0.25

0.25

Converges in < 50 iterations

ISD-3 (poly_order = 3)

n/a (numerically unstable)

finite values only

Polynomial expansion produces 34 features for \(n_s = 4\), \(n_{\rm pix}/n_{\rm feat} \approx 175\); ill-conditioned with real correlated templates. Check finiteness only.

MCMC-additive

< 0.25

0.25

Chain shape \((n_w \times 160,\; n_s + 1)\) with \(n_w \geq 2(n_s+1)+2 = 12\); \(n_{\rm dim} = 5\)

MCMC-combined (Berlfein+2024)

< 0.30 for \(\hat a_i\); < 0.30 for \(\hat b_i\)

0.30

Chain shape \((n_w \times 160,\; 2n_s + 1)\); \(n_{\rm dim} = 9\). Positive posterior variances confirmed.

Model selection and diagnostics

  • LRT (\(H_0\): additive, \(H_1\): combined) — the additive null is rejected at the 5 % level because \(b_0, b_2, b_3 \neq 0\). The test statistic \(\lambda_{\rm LR} = 2[\ln\mathcal{L}_1 - \ln\mathcal{L}_0]\) is positive and \(p < 0.05\).

  • Null test — the maximum Pearson correlation \(\max_i |r_i|\) between the OLS-corrected weights and the template maps satisfies \(\max|r_i| < 0.50\), confirming partial residual removal.

  • SNR ranking — the SNR array has shape \((4,)\), all entries are \(\geq 0\), and at least one entry is \(> 0.01\), demonstrating that the real GAIA and LS10 maps carry detectable systematic signal.

Running the real-template tests

Ensure the FITS files are present at their default paths (see load_real_templates()), then:

conda activate sys_map
pytest tests/test_real_templates.py -v

Expected output:

28 passed in ~62 s

To skip when files are absent, they degrade gracefully:

28 skipped (reason: GAIA/LS10 FITS files not found)

Test outcome

The full test suite was executed in the sys_map conda environment (Python 3.11, JAX 64-bit, scikit-learn ≥ 1.3, real GAIA and LS10 FITS files present) against the current codebase.

Results:

  • 258 passed, 0 failed, 0 errors (excluding test_timing.py)

  • test_real_templates.py28 passed using real GAIA DR3 and LS10 DR10 systematic maps; all six methods completed without error on the 5 954-pixel LS10 footprint.

  • test_regression.py26 passed, including the previously known edge case (TestElasticNet::test_weights_bounded_positive), which now passes.

  • The systematic test matrix (scripts/run_systematic_tests.py, 32 configurations) ran to completion; results are documented in Model test matrix.

Status: PASSED — all 258 tests pass with no known failures.