Testing
sys_mapping has a comprehensive test suite in the tests/ directory.
Tests are organised by module and cover correctness, numerical accuracy,
and performance.
Running the tests
Install the package with the development extras first:
pip install -e ".[dev]"
Then run the full suite:
pytest tests/ -v
Run a single module:
pytest tests/test_contamination.py -v
Run tests that require scikit-learn (skipped otherwise):
pip install scikit-learn
pytest tests/test_regression.py -v
Run performance benchmarks (slow; uses --benchmark marker):
pytest tests/test_benchmarks.py -v
Test modules
File |
Tests |
Coverage |
|---|---|---|
|
15 |
|
|
7 |
|
|
32 |
|
|
12 |
|
|
7 |
|
|
34 |
End-to-end accuracy: contamination round-trip errors < 1e-10; overdensity mean near zero; likelihood gradient norm; correction bias below tolerance; amplitude bias estimator |
|
19 |
|
|
26 |
|
|
19 |
|
|
40 |
|
|
19 |
Execution time regression: contamination, maps, likelihood, correction, and utils must complete within set thresholds |
|
240 |
Wall-clock scaling of all six methods as a function of NSIDE
(8, 16, 32, 64) and n_templates (1–10); 40 configs × 4 NSIDE
values = 160 tests per fast method; MCMC tests marked
|
|
28 |
Integration tests using real GAIA and LS DR10 HEALPix maps as systematic templates (synth_5, synth_6); all six methods exercised on a synthetic mock built within the LS10 survey footprint (NSIDE = 32, ~5 954 valid pixels). Tests skipped automatically if FITS files are absent. See Real-template integration tests below. |
Test design philosophy
Unit tests check individual functions in isolation:
Input/output shapes are always verified.
Round-trip identities are tested (e.g.
apply_contamination∘invert_contamination= identity).Known closed-form results are tested to machine precision where applicable (e.g. the harmonic bias formula, pack/unpack).
Accuracy tests (test_accuracy.py) verify that numerical errors
remain below physically meaningful thresholds across a range of NSIDE
values, template counts, and noise levels.
Integration tests (TestMockPipelineIntegration in test_mocks.py)
run the full pipeline from raw catalogs to overdensity estimates and confirm
that OLS recovers injected additive amplitudes to within 50 % at low
signal-to-noise (NSIDE = 16, 100 galaxies / pixel).
Skipped tests — test_real_templates.py tests are skipped when the
GAIA and LS10 FITS files are absent from
~/data/legacysurvey/dr10/systematics/. ElasticNet tests in
test_regression.py are skipped when scikit-learn is not installed.
Continuous integration
The test suite is designed to run in the sys_map conda environment:
conda activate sys_map
pytest tests/ -v --tb=short
Expected output with scikit-learn and real data files present:
258 passed in ~34 s
Without real data files (no GAIA/LS10 FITS):
230 passed in ~21 s
Systematic test matrix
Beyond the unit and integration test suite, a separate systematic test
matrix is available in scripts/run_systematic_tests.py. This script
is not run as part of pytest; it is a standalone benchmark that exhaustively
evaluates all six methods across 32 contamination configurations (Tier 1:
additive-only and multiplicative-only with 1–7 templates; Tier 2: mixed
additive+multiplicative with varying numbers of multiplicative templates).
The key metric is \(\sigma[(1+\delta_g^{\rm corr})/(1+\delta_g^{\rm true})]\), the standard deviation of the pixel-level correction ratio. Results and figures are documented in Model test matrix.
To run:
conda activate sys_map
python scripts/run_systematic_tests.py \\
--nside 32 --n-walkers 64 --n-steps 200 --n-burn 50 \\
--output-dir results/systematic_tests/
Expected runtime: ~8 minutes (NSIDE = 32, all 6 methods, 32 configurations).
After re-running with the updated script, the CSV will contain a time_s column
per method-configuration row, and two timing figures are written to
results/systematic_tests/:
timing_vs_ntemplates.png— wall-clock time vs. n_templates per method (log scale)timing_mean_per_method.png— mean compute time per method across all 32 configs
Timing scaling tests
tests/test_timing.py parametrises all six methods over NSIDE ∈ {8, 16, 32, 64}
and n_templates ∈ {1, …, 10}, measuring actual wall-clock time for each
(NSIDE, n_templates) pair. MCMC tests are marked @pytest.mark.slow and
can be skipped:
pytest tests/test_timing.py -v -s -m "not slow" # OLS / ElasticNet / ISD only
pytest tests/test_timing.py -v -s # all methods (slow)
The -s flag is required to see the per-test timing printed to stdout.
Each test asserts that the method completes within a generous wall-clock bound
(30 s for OLS/ISD, 120 s for ElasticNet/ISD-3, 600 s for MCMC), so the suite
fails only if a method becomes catastrophically slow. Use the stdout output for
the actual scaling data.
Expected test count: 40 (NSIDE × n_templates) per method class × 6 classes = 240 tests (excluding MCMC slow tests from the default run: 160 fast tests).
Adding new tests
Follow the existing pattern:
Create a test class per public function group (e.g.
class TestMyFunction).Use
@pytest.fixture(scope="class")for expensive shared objects.Test shapes, dtype, value ranges, and known analytic limits.
For stochastic tests, fix the seed via
seed=ornp.random.default_rng(N).Add the new file to this page.
Real-template integration tests
tests/test_real_templates.py tests every implemented method end-to-end on
a synthetic galaxy mock built using real observational systematic maps
from GAIA DR3 and the Legacy Survey DR10. These tests exercise the entire
pipeline — loading, normalisation, injection, inference, and model selection
— with physically realistic templates rather than toy random fields.
Template set
Label |
Source |
NSIDE (tests) |
Physical interpretation |
|---|---|---|---|
|
Synthetic, family 0 (\(C_\ell \propto e^{-\ell/500}\)) |
32 |
Large-scale coherent artefact (e.g. zodiacal light) |
|
Synthetic, family 1 (\(C_\ell \propto e^{-(\ell/250)^2}\)) |
32 |
Intermediate-scale artefact (e.g. airglow) |
synth_5 |
GAIA DR3 — |
32 |
Faint-star surface density — proxy for stellar contamination (misclassified stars inflate galaxy counts) |
synth_6 |
LS DR10 — |
32 |
Galaxy depth in the z band — proxy for selection-depth variations (depth modulates survey completeness multiplicatively) |
Survey footprint: the LS10 depth valid mask (pixels where GALDEPTH_Z > 0) restricts the mock to 5 954 pixels (48.4 % of the NSIDE = 32 sphere). GAIA is a full-sky map (12 288 / 12 288 valid pixels); its values inside the LS10 footprint are used.
Mock configuration and injected parameters
Parameter |
Value |
|---|---|
NSIDE |
32 (pixel area ≈ 3.4 deg², 12 288 pixels total) |
Survey footprint pixels |
5 954 (LS10 depth mask) |
Templates \(n_s\) |
4 (synth_0, synth_1, GAIA_nstar_faint, LS10_GALDEPTH_Z) |
\(a_i^{\rm true}\) (additive) |
\((0.08,\ {-0.05},\ 0.06,\ {-0.04})\) |
\(b_i^{\rm true}\) (multiplicative) |
\((0.04,\ 0.00,\ {-0.03},\ 0.05)\) |
Mean galaxies per pixel \(\bar n\) |
50 |
Random / galaxy ratio |
8× |
Seed |
7 |
The galaxy overdensity follows a lognormal field
\(\delta_g^{\rm true} = e^{G - \sigma_G^2/2} - 1\) with
\(C_\ell^G \propto (\ell+1)^{-2}\), \(\sigma_G = 0.5\).
Note that template 1 (synth_1) has \(b_1^{\rm true} = 0\) (purely
additive) while templates 0, 2, 3 have non-zero multiplicative amplitudes —
a realistic mixed-contamination scenario.
Method results
Each of the six implemented methods is run once on this fixed mock:
Method |
Mean \(|\hat a_i - a_i^{\rm true}|\) |
Tolerance |
Notes |
|---|---|---|---|
OLS |
< 0.20 |
0.20 |
Ordinary least-squares pixel regression; fastest method |
ElasticNet |
< 0.25 |
0.25 |
Cross-validated (3 folds); requires |
ISD-1 (poly_order = 1) |
< 0.25 |
0.25 |
Converges in < 50 iterations |
ISD-3 (poly_order = 3) |
n/a (numerically unstable) |
finite values only |
Polynomial expansion produces 34 features for \(n_s = 4\), \(n_{\rm pix}/n_{\rm feat} \approx 175\); ill-conditioned with real correlated templates. Check finiteness only. |
MCMC-additive |
< 0.25 |
0.25 |
Chain shape \((n_w \times 160,\; n_s + 1)\) with \(n_w \geq 2(n_s+1)+2 = 12\); \(n_{\rm dim} = 5\) |
MCMC-combined (Berlfein+2024) |
< 0.30 for \(\hat a_i\); < 0.30 for \(\hat b_i\) |
0.30 |
Chain shape \((n_w \times 160,\; 2n_s + 1)\); \(n_{\rm dim} = 9\). Positive posterior variances confirmed. |
Model selection and diagnostics
LRT (\(H_0\): additive, \(H_1\): combined) — the additive null is rejected at the 5 % level because \(b_0, b_2, b_3 \neq 0\). The test statistic \(\lambda_{\rm LR} = 2[\ln\mathcal{L}_1 - \ln\mathcal{L}_0]\) is positive and \(p < 0.05\).
Null test — the maximum Pearson correlation \(\max_i |r_i|\) between the OLS-corrected weights and the template maps satisfies \(\max|r_i| < 0.50\), confirming partial residual removal.
SNR ranking — the SNR array has shape \((4,)\), all entries are \(\geq 0\), and at least one entry is \(> 0.01\), demonstrating that the real GAIA and LS10 maps carry detectable systematic signal.
Running the real-template tests
Ensure the FITS files are present at their default paths (see
load_real_templates()), then:
conda activate sys_map
pytest tests/test_real_templates.py -v
Expected output:
28 passed in ~62 s
To skip when files are absent, they degrade gracefully:
28 skipped (reason: GAIA/LS10 FITS files not found)
Test outcome
The full test suite was executed in the sys_map conda environment
(Python 3.11, JAX 64-bit, scikit-learn ≥ 1.3, real GAIA and LS10 FITS files
present) against the current codebase.
Results:
258 passed, 0 failed, 0 errors (excluding
test_timing.py)test_real_templates.py— 28 passed using real GAIA DR3 and LS10 DR10 systematic maps; all six methods completed without error on the 5 954-pixel LS10 footprint.test_regression.py— 26 passed, including the previously known edge case (TestElasticNet::test_weights_bounded_positive), which now passes.The systematic test matrix (
scripts/run_systematic_tests.py, 32 configurations) ran to completion; results are documented in Model test matrix.
Status: PASSED — all 258 tests pass with no known failures.