Test suite and validation
This page documents the test philosophy, per-module test descriptions, and how to run the full suite. The tests are organised in three tiers of increasing physical rigor.
Test philosophy
A cosmological estimator can pass all unit tests and still be systematically wrong. For example, a sign error in the distance-modulus formula would not be caught by a “check that phi > 0” test — but it would produce a LF shifted by several magnitudes from the truth.
For this reason the test suite is structured in three tiers:
Tier |
Purpose |
How to spot a failure |
|---|---|---|
Unit |
Correct output shapes, signs, and error conditions |
Shape mismatch, negative phi, wrong exception |
Consistency |
Independent estimators agree on the same mock data |
Ratio \(\phi_\mathrm{SWML} / \phi_{1/V_\mathrm{max}}\) outside [0.3, 3] |
Physical recovery |
Estimator returns the correct answer on known input |
Recovered \(\alpha\) or \(M^*\) deviates by more than \(\sigma\) from truth |
Most existing tests (test_lf_smf.py, test_twopcf_*.py) are in tiers 1–2.
test_recovery.py and test_cosmology_validation.py add tier-3 coverage.
Running the tests
# Activate the environment first
conda activate sum_stat
# Run all tests
pytest tests/ -v
# Run only the physical recovery tests (slow — generates mock catalogues)
pytest tests/test_recovery.py -v -s
# Run only the cosmology validation tests
pytest tests/test_cosmology_validation.py -v -s
# Run the timing benchmarks
pytest tests/test_benchmarks.py -v -s
Expected total time: ~3–5 minutes (recovery tests generate mock catalogues via
z_at_value, which is the bottleneck).
Per-module test summaries
Catalogue (test_catalogue.py)
Test |
What it checks |
|---|---|
|
Constructor, shape validation, default weights, optional fields |
|
Ellipticity shape checks, weight normalisation |
|
Photo-z calibration table construction |
Luminosity & stellar mass functions (test_lf_smf.py)
Test class |
What it checks |
|---|---|
|
Output shapes, non-negativity, error ≤ phi, missing abs_mag raises ValueError |
|
Same for stellar mass; missing log10_mstar raises ValueError |
|
Convergence, shape normalisation, absolute normalisation matches Vmax |
|
Cumulative LF is monotone non-decreasing and positive |
|
Cross-estimator consistency (Vmax vs SWML ratio within [0.5, 2]); tau|<5 for no-evolution sample |
|
Positivity, peak near \(M^*\), JAX JIT/grad compatible |
|
Positivity, single-component limit, JAX JIT/grad compatible |
|
Positive, symmetric, peak at \(\Delta m = 0\) |
|
Broadens the SMF (higher dispersion after convolution) |
Physical recovery (test_recovery.py) — tier 3
These tests generate mock catalogues from a known Schechter function and verify that the estimators recover the truth within physically motivated tolerances.
Test |
What it verifies |
|---|---|
|
Integrated \(1/V_\mathrm{max}\) LF is positive and finite; peak within 2.5 mag of \(M^*\) |
|
Schechter fit to Vmax bins: \(|\hat\alpha - \alpha_\mathrm{true}| < 0.4\), \(|\hat M^* - M^*_\mathrm{true}| < 0.8\) mag |
|
SWML/Vmax bin ratio within [0.1, 10] in populated bins |
|
Integrated densities agree within factor 3 |
|
\(C^-\) LF is positive, finite, non-decreasing |
|
Total density in range \(10^{-6}\)–\(1\) Mpc-3 |
|
Integrated SMF is positive and finite |
|
SMF peak within 1.5 dex of true \(\log M^*\) |
|
|
|
Restricting \(z_{\max,i}\) increases \(\phi\) (smaller \(V_\mathrm{max}\)) |
Cosmology validation (test_cosmology_validation.py) — tier 3
Test |
What it verifies |
|---|---|
|
JAX \(\chi(z)\) agrees with analytic EdS solution within 0.1% at \(z \in \{0.1, 0.5, 1, 2\}\) |
|
Array and scalar paths give identical results |
|
\(|\chi_\mathrm{JAX}(z) - \chi_\mathrm{astropy}(z)| < 0.5\) Mpc at 20 redshifts |
|
\(V_c\) agrees within 0.1% for \(z > 0.1\) |
|
\(D_A\) agrees within 1% |
|
\(h\) and \(\Omega_m\) extracted correctly |
|
\(V_c(z)\) strictly increasing from \(z = 0.01\) to \(z = 5\) |
|
\(V_c(z \approx 0) \approx 0\) |
|
\(\chi(z) > 0\) for all \(z > 0\) |
|
For a volume-limited sample: \(\sum 1/V_\mathrm{max} \approx N / V_\mathrm{survey}\) within 1% |
Two-point correlation functions (test_twopcf_*.py)
Test |
What it checks |
|---|---|
|
\(\hat w(\theta) \ge -1\); correct shape; output shape matches bins |
|
\(\hat w(\theta) \ge -1\) |
|
Aggregated pair counts give consistent LS estimate |
|
Projected \(w_p(r_p) > 0\); output shape; units [Mpc] |
|
Monopole consistent with \(w_p\); quadrupole sign |
Covariance (test_covariance.py)
Test |
What it checks |
|---|---|
|
Correct number of regions; all objects assigned |
|
Covariance matrix shape; positive diagonal; scales as \((N_\mathrm{jk}-1)/N_\mathrm{jk}\) |
|
Positive diagonal; consistent normalisation |
Lensing (test_lensing_esd.py)
Test |
What it checks |
|---|---|
|
\(\Sigma_\mathrm{crit}\) is positive, finite, and increases with lens redshift |
N(z) estimation (test_nz.py)
Test |
What it checks |
|---|---|
|
Output shape, non-negativity, normalisation to unity |
|
KDE output non-negative; integrates to approximately 1 |
Benchmark timing (test_benchmarks.py)
These tests record that the core routines meet timing targets on a standard CPU.
They are not run by default (the -s flag is needed to see timing output).
Benchmark |
Input size |
Target |
|---|---|---|
Comoving distance (JAX) |
1 000 redshifts |
< 10 ms (after JIT warmup) |
\(w(\theta)\) Landy-Szalay |
10 000 galaxies, 50 000 randoms |
< 30 s |
\(w_p(r_p)\) projected |
5 000 galaxies, 25 000 randoms |
< 60 s |
Jackknife covariance |
\(N_\mathrm{jk} = 100\) sub-surveys |
< 5 s |
Test coverage gaps (pre-release)
The table below lists the tier-3 physical recovery and literature comparison tests that are missing before the package can be considered production-ready for its four stable estimators.
Missing test |
Priority |
Target file |
Pass criterion |
|---|---|---|---|
WPRP physical recovery — recover a power-law wp(rp) γ from a Poisson-sampled anisotropic mock |
High |
|
\(|\hat\gamma - \gamma_\mathrm{true}| < 0.15\) on scales 0.1–10 Mpc |
WTHETA physical recovery — recover w(θ) from a Poisson-sampled angular mock |
High |
|
Power-law slope within 0.15 of truth |
DeltaSigma physical recovery — recover ΔΣ(rp) from a synthetic NFW profile with known mass |
High |
|
Recovered M200c within 30% of truth |
SMF vs COSMOS literature — automate comparison against Ilbert+ (2013) COSMOS2015 SMF in ≥ 2 redshift bins |
Medium |
|
Δφ/φ < 30% in populated bins (accounting for cosmic variance) |
WPRP vs GAMA literature — automate comparison against Farrow+ (2015) GAMA wp |
Medium |
|
Δwp/wp < 50% (field-variance dominated) |
Jackknife covariance quality — verify JK covariance is PSD and condition number is < threshold |
Low |
|
All eigenvalues > 0; condition number < 10⁴ |
Stub test files for the high-priority items are at
tests/test_recovery_clustering.py and tests/test_recovery_lensing.py.
Each test is marked @pytest.mark.skip(reason="not yet implemented") so it
appears in the test output as a reminder without blocking the suite.
Adding new tests
When adding a new estimator or modifying an existing one, please add at minimum:
A unit test (correct output shape and sign, expected exceptions).
A physical recovery test if the estimator is a statistical estimator (verify on a mock with known truth).
The _make_schechter_lf_cat and _make_double_schechter_smf_cat helpers in
tests/test_recovery.py can be reused to generate controlled mock catalogues.
Use numpy.random.default_rng(seed) with a fixed seed so tests are reproducible.