Test suite and validation

This page documents the test philosophy, per-module test descriptions, and how to run the full suite. The tests are organised in three tiers of increasing physical rigor.


Test philosophy

A cosmological estimator can pass all unit tests and still be systematically wrong. For example, a sign error in the distance-modulus formula would not be caught by a “check that phi > 0” test — but it would produce a LF shifted by several magnitudes from the truth.

For this reason the test suite is structured in three tiers:

Test tiers

Tier

Purpose

How to spot a failure

Unit

Correct output shapes, signs, and error conditions

Shape mismatch, negative phi, wrong exception

Consistency

Independent estimators agree on the same mock data

Ratio \(\phi_\mathrm{SWML} / \phi_{1/V_\mathrm{max}}\) outside [0.3, 3]

Physical recovery

Estimator returns the correct answer on known input

Recovered \(\alpha\) or \(M^*\) deviates by more than \(\sigma\) from truth

Most existing tests (test_lf_smf.py, test_twopcf_*.py) are in tiers 1–2. test_recovery.py and test_cosmology_validation.py add tier-3 coverage.


Running the tests

# Activate the environment first
conda activate sum_stat

# Run all tests
pytest tests/ -v

# Run only the physical recovery tests (slow — generates mock catalogues)
pytest tests/test_recovery.py -v -s

# Run only the cosmology validation tests
pytest tests/test_cosmology_validation.py -v -s

# Run the timing benchmarks
pytest tests/test_benchmarks.py -v -s

Expected total time: ~3–5 minutes (recovery tests generate mock catalogues via z_at_value, which is the bottleneck).


Per-module test summaries

Catalogue (test_catalogue.py)

Test

What it checks

TestGalaxyCatalogue

Constructor, shape validation, default weights, optional fields

TestShapeCatalogue

Ellipticity shape checks, weight normalisation

TestPhotoZCalibTable

Photo-z calibration table construction

Luminosity & stellar mass functions (test_lf_smf.py)

Test class

What it checks

TestVmaxLF

Output shapes, non-negativity, error ≤ phi, missing abs_mag raises ValueError

TestVmaxSMF

Same for stellar mass; missing log10_mstar raises ValueError

TestSWML

Convergence, shape normalisation, absolute normalisation matches Vmax

TestCMinus

Cumulative LF is monotone non-decreasing and positive

TestMCComparison

Cross-estimator consistency (Vmax vs SWML ratio within [0.5, 2]); tau|<5 for no-evolution sample

TestSchechterMass

Positivity, peak near \(M^*\), JAX JIT/grad compatible

TestDoubleSchechterMass

Positivity, single-component limit, JAX JIT/grad compatible

TestEddingtonKernel

Positive, symmetric, peak at \(\Delta m = 0\)

TestConvolveSmfEddington

Broadens the SMF (higher dispersion after convolution)

Physical recovery (test_recovery.py) — tier 3

These tests generate mock catalogues from a known Schechter function and verify that the estimators recover the truth within physically motivated tolerances.

Test

What it verifies

TestVmaxRecovery::test_recovers_schechter_normalization

Integrated \(1/V_\mathrm{max}\) LF is positive and finite; peak within 2.5 mag of \(M^*\)

TestVmaxRecovery::test_shape_matches_schechter_slope

Schechter fit to Vmax bins: \(|\hat\alpha - \alpha_\mathrm{true}| < 0.4\), \(|\hat M^* - M^*_\mathrm{true}| < 0.8\) mag

TestSWMLRecovery::test_swml_recovers_schechter_shape

SWML/Vmax bin ratio within [0.1, 10] in populated bins

TestSWMLRecovery::test_vmax_swml_integrated_density_consistent

Integrated densities agree within factor 3

TestCminusRecovery::test_cminus_cumulative_positive_and_finite

\(C^-\) LF is positive, finite, non-decreasing

TestCminusRecovery::test_cminus_total_density_order_of_magnitude

Total density in range \(10^{-6}\)\(1\) Mpc-3

TestSMFVmaxRecovery::test_smf_vmax_integrated_density

Integrated SMF is positive and finite

TestSMFVmaxRecovery::test_smf_vmax_peak_near_mstar

SMF peak within 1.5 dex of true \(\log M^*\)

TestVmaxTruncation::test_individual_zmax_truncation_does_not_bias_normalization

z_max_individual = z_max gives numerically identical result to default

TestVmaxTruncation::test_lower_zmax_individual_increases_normalization

Restricting \(z_{\max,i}\) increases \(\phi\) (smaller \(V_\mathrm{max}\))

Cosmology validation (test_cosmology_validation.py) — tier 3

Test

What it verifies

TestEinsteinDeSitter::test_comoving_distance_eds

JAX \(\chi(z)\) agrees with analytic EdS solution within 0.1% at \(z \in \{0.1, 0.5, 1, 2\}\)

TestEinsteinDeSitter::test_comoving_distance_array_eds

Array and scalar paths give identical results

TestPlanck18VsAstropy::test_comoving_distance_matches_astropy

\(|\chi_\mathrm{JAX}(z) - \chi_\mathrm{astropy}(z)| < 0.5\) Mpc at 20 redshifts

TestPlanck18VsAstropy::test_comoving_volume_matches_astropy

\(V_c\) agrees within 0.1% for \(z > 0.1\)

TestPlanck18VsAstropy::test_angular_diameter_distance_vs_astropy

\(D_A\) agrees within 1%

TestPlanck18VsAstropy::test_astropy_to_jax_cosmo_extractor

\(h\) and \(\Omega_m\) extracted correctly

TestComovingVolumeProperties::test_comoving_volume_monotone

\(V_c(z)\) strictly increasing from \(z = 0.01\) to \(z = 5\)

TestComovingVolumeProperties::test_comoving_volume_zero_at_zero

\(V_c(z \approx 0) \approx 0\)

TestComovingVolumeProperties::test_comoving_distance_positive

\(\chi(z) > 0\) for all \(z > 0\)

TestVmaxConsistency::test_vmax_sum_equals_number_density

For a volume-limited sample: \(\sum 1/V_\mathrm{max} \approx N / V_\mathrm{survey}\) within 1%

Two-point correlation functions (test_twopcf_*.py)

Test

What it checks

TestLandySzalayJax

\(\hat w(\theta) \ge -1\); correct shape; output shape matches bins

TestDavisPeeblesJax

\(\hat w(\theta) \ge -1\)

TestWThetaFromPairCounts

Aggregated pair counts give consistent LS estimate

TestWp

Projected \(w_p(r_p) > 0\); output shape; units [Mpc]

TestLegendreDecompose

Monopole consistent with \(w_p\); quadrupole sign

Covariance (test_covariance.py)

Test

What it checks

TestAssignJackknifeRegions

Correct number of regions; all objects assigned

TestJackknifeFromSubsamples

Covariance matrix shape; positive diagonal; scales as \((N_\mathrm{jk}-1)/N_\mathrm{jk}\)

TestBootstrapCovariance

Positive diagonal; consistent normalisation

Lensing (test_lensing_esd.py)

Test

What it checks

TestShearCalib

\(\Sigma_\mathrm{crit}\) is positive, finite, and increases with lens redshift

N(z) estimation (test_nz.py)

Test

What it checks

TestNzHistogram

Output shape, non-negativity, normalisation to unity

TestNzKde

KDE output non-negative; integrates to approximately 1


Benchmark timing (test_benchmarks.py)

These tests record that the core routines meet timing targets on a standard CPU. They are not run by default (the -s flag is needed to see timing output).

Benchmark

Input size

Target

Comoving distance (JAX)

1 000 redshifts

< 10 ms (after JIT warmup)

\(w(\theta)\) Landy-Szalay

10 000 galaxies, 50 000 randoms

< 30 s

\(w_p(r_p)\) projected

5 000 galaxies, 25 000 randoms

< 60 s

Jackknife covariance

\(N_\mathrm{jk} = 100\) sub-surveys

< 5 s


Test coverage gaps (pre-release)

The table below lists the tier-3 physical recovery and literature comparison tests that are missing before the package can be considered production-ready for its four stable estimators.

Missing test

Priority

Target file

Pass criterion

WPRP physical recovery — recover a power-law wp(rp) γ from a Poisson-sampled anisotropic mock

High

tests/test_recovery_clustering.py

\(|\hat\gamma - \gamma_\mathrm{true}| < 0.15\) on scales 0.1–10 Mpc

WTHETA physical recovery — recover w(θ) from a Poisson-sampled angular mock

High

tests/test_recovery_clustering.py

Power-law slope within 0.15 of truth

DeltaSigma physical recovery — recover ΔΣ(rp) from a synthetic NFW profile with known mass

High

tests/test_recovery_lensing.py

Recovered M200c within 30% of truth

SMF vs COSMOS literature — automate comparison against Ilbert+ (2013) COSMOS2015 SMF in ≥ 2 redshift bins

Medium

tests/test_literature_smf.py or docs/scripts/

Δφ/φ < 30% in populated bins (accounting for cosmic variance)

WPRP vs GAMA literature — automate comparison against Farrow+ (2015) GAMA wp

Medium

tests/test_literature_wprp.py

Δwp/wp < 50% (field-variance dominated)

Jackknife covariance quality — verify JK covariance is PSD and condition number is < threshold

Low

tests/test_covariance.py

All eigenvalues > 0; condition number < 10⁴

Stub test files for the high-priority items are at tests/test_recovery_clustering.py and tests/test_recovery_lensing.py. Each test is marked @pytest.mark.skip(reason="not yet implemented") so it appears in the test output as a reminder without blocking the suite.


Adding new tests

When adding a new estimator or modifying an existing one, please add at minimum:

  1. A unit test (correct output shape and sign, expected exceptions).

  2. A physical recovery test if the estimator is a statistical estimator (verify on a mock with known truth).

The _make_schechter_lf_cat and _make_double_schechter_smf_cat helpers in tests/test_recovery.py can be reused to generate controlled mock catalogues.

Use numpy.random.default_rng(seed) with a fixed seed so tests are reproducible.