Test suite and validation

This page documents the test philosophy, per-module test descriptions, and how to run the full suite. The tests are organised in three tiers of increasing physical rigor.

Test philosophy 

A cosmological estimator can pass all unit tests and still be systematically wrong. For example, a sign error in the distance-modulus formula would not be caught by a “check that phi > 0” test — but it would produce a LF shifted by several magnitudes from the truth.

For this reason the test suite is structured in three tiers:

Test tiers
Tier	Purpose	How to spot a failure
Unit	Correct output shapes, signs, and error conditions	Shape mismatch, negative phi, wrong exception
Consistency	Independent estimators agree on the same mock data	Ratio \(\phi_\mathrm{SWML} / \phi_{1/V_\mathrm{max}}\) outside [0.3, 3]
Physical recovery	Estimator returns the correct answer on known input	Recovered \(\alpha\) or \(M^*\) deviates by more than \(\sigma\) from truth

Most existing tests (test_lf_smf.py, test_twopcf_*.py) are in tiers 1–2. test_recovery.py and test_cosmology_validation.py add tier-3 coverage.

Running the tests 

# Activate the environment first
conda activate sum_stat

# Run all tests
pytest tests/ -v

# Run only the physical recovery tests (slow — generates mock catalogues)
pytest tests/test_recovery.py -v -s

# Run only the cosmology validation tests
pytest tests/test_cosmology_validation.py -v -s

# Run the timing benchmarks
pytest tests/test_benchmarks.py -v -s

Expected total time: ~3–5 minutes (recovery tests generate mock catalogues via z_at_value, which is the bottleneck).

Per-module test summaries 

Catalogue (`test_catalogue.py`)

Test	What it checks
`TestGalaxyCatalogue`	Constructor, shape validation, default weights, optional fields
`TestShapeCatalogue`	Ellipticity shape checks, weight normalisation
`TestPhotoZCalibTable`	Photo-z calibration table construction

Luminosity & stellar mass functions (`test_lf_smf.py`)

Test class	What it checks
`TestVmaxLF`	Output shapes, non-negativity, error ≤ phi, missing abs_mag raises ValueError
`TestVmaxSMF`	Same for stellar mass; missing log10_mstar raises ValueError
`TestSWML`	Convergence, shape normalisation, absolute normalisation matches Vmax
`TestCMinus`	Cumulative LF is monotone non-decreasing and positive
`TestMCComparison`	Cross-estimator consistency (Vmax vs SWML ratio within [0.5, 2]); tau\|<5 for no-evolution sample
`TestSchechterMass`	Positivity, peak near \(M^*\), JAX JIT/grad compatible
`TestDoubleSchechterMass`	Positivity, single-component limit, JAX JIT/grad compatible
`TestEddingtonKernel`	Positive, symmetric, peak at \(\Delta m = 0\)
`TestConvolveSmfEddington`	Broadens the SMF (higher dispersion after convolution)

Physical recovery (`test_recovery.py`) — tier 3

These tests generate mock catalogues from a known Schechter function and verify that the estimators recover the truth within physically motivated tolerances.

Test	What it verifies
`TestVmaxRecovery::test_recovers_schechter_normalization`	Integrated \(1/V_\mathrm{max}\) LF is positive and finite; peak within 2.5 mag of \(M^*\)
`TestVmaxRecovery::test_shape_matches_schechter_slope`	Schechter fit to Vmax bins: \(\|\hat\alpha - \alpha_\mathrm{true}\| < 0.4\), \(\|\hat M^* - M^*_\mathrm{true}\| < 0.8\) mag
`TestSWMLRecovery::test_swml_recovers_schechter_shape`	SWML/Vmax bin ratio within [0.1, 10] in populated bins
`TestSWMLRecovery::test_vmax_swml_integrated_density_consistent`	Integrated densities agree within factor 3
`TestCminusRecovery::test_cminus_cumulative_positive_and_finite`	\(C^-\) LF is positive, finite, non-decreasing
`TestCminusRecovery::test_cminus_total_density_order_of_magnitude`	Total density in range \(10^{-6}\)–\(1\) Mpc^-3
`TestSMFVmaxRecovery::test_smf_vmax_integrated_density`	Integrated SMF is positive and finite
`TestSMFVmaxRecovery::test_smf_vmax_peak_near_mstar`	SMF peak within 1.5 dex of true \(\log M^*\)
`TestVmaxTruncation::test_individual_zmax_truncation_does_not_bias_normalization`	`z_max_individual = z_max` gives numerically identical result to default
`TestVmaxTruncation::test_lower_zmax_individual_increases_normalization`	Restricting \(z_{\max,i}\) increases \(\phi\) (smaller \(V_\mathrm{max}\))

Cosmology validation (`test_cosmology_validation.py`) — tier 3

Test	What it verifies
`TestEinsteinDeSitter::test_comoving_distance_eds`	JAX \(\chi(z)\) agrees with analytic EdS solution within 0.1% at \(z \in \{0.1, 0.5, 1, 2\}\)
`TestEinsteinDeSitter::test_comoving_distance_array_eds`	Array and scalar paths give identical results
`TestPlanck18VsAstropy::test_comoving_distance_matches_astropy`	\(\|\chi_\mathrm{JAX}(z) - \chi_\mathrm{astropy}(z)\| < 0.5\) Mpc at 20 redshifts
`TestPlanck18VsAstropy::test_comoving_volume_matches_astropy`	\(V_c\) agrees within 0.1% for \(z > 0.1\)
`TestPlanck18VsAstropy::test_angular_diameter_distance_vs_astropy`	\(D_A\) agrees within 1%
`TestPlanck18VsAstropy::test_astropy_to_jax_cosmo_extractor`	\(h\) and \(\Omega_m\) extracted correctly
`TestComovingVolumeProperties::test_comoving_volume_monotone`	\(V_c(z)\) strictly increasing from \(z = 0.01\) to \(z = 5\)
`TestComovingVolumeProperties::test_comoving_volume_zero_at_zero`	\(V_c(z \approx 0) \approx 0\)
`TestComovingVolumeProperties::test_comoving_distance_positive`	\(\chi(z) > 0\) for all \(z > 0\)
`TestVmaxConsistency::test_vmax_sum_equals_number_density`	For a volume-limited sample: \(\sum 1/V_\mathrm{max} \approx N / V_\mathrm{survey}\) within 1%

Two-point correlation functions (`test_twopcf_*.py`)

Test	What it checks
`TestLandySzalayJax`	\(\hat w(\theta) \ge -1\); correct shape; output shape matches bins
`TestDavisPeeblesJax`	\(\hat w(\theta) \ge -1\)
`TestWThetaFromPairCounts`	Aggregated pair counts give consistent LS estimate
`TestWp`	Projected \(w_p(r_p) > 0\); output shape; units [Mpc]
`TestLegendreDecompose`	Monopole consistent with \(w_p\); quadrupole sign

Covariance (`test_covariance.py`)

Test	What it checks
`TestAssignJackknifeRegions`	Correct number of regions; all objects assigned
`TestJackknifeFromSubsamples`	Covariance matrix shape; positive diagonal; scales as \((N_\mathrm{jk}-1)/N_\mathrm{jk}\)
`TestBootstrapCovariance`	Positive diagonal; consistent normalisation

Lensing (`test_lensing_esd.py`)

Test	What it checks
`TestShearCalib`	\(\Sigma_\mathrm{crit}\) is positive, finite, and increases with lens redshift

N(z) estimation (`test_nz.py`)

Test	What it checks
`TestNzHistogram`	Output shape, non-negativity, normalisation to unity
`TestNzKde`	KDE output non-negative; integrates to approximately 1

Benchmark timing (`test_benchmarks.py`)

These tests record that the core routines meet timing targets on a standard CPU. They are not run by default (the -s flag is needed to see timing output).

Benchmark	Input size	Target
Comoving distance (JAX)	1 000 redshifts	< 10 ms (after JIT warmup)
\(w(\theta)\) Landy-Szalay	10 000 galaxies, 50 000 randoms	< 30 s
\(w_p(r_p)\) projected	5 000 galaxies, 25 000 randoms	< 60 s
Jackknife covariance	\(N_\mathrm{jk} = 100\) sub-surveys	< 5 s

Test coverage gaps (pre-release)

The table below lists the tier-3 physical recovery and literature comparison tests that are missing before the package can be considered production-ready for its four stable estimators.

Missing test	Priority	Target file	Pass criterion
WPRP physical recovery — recover a power-law w_p(r_p) γ from a Poisson-sampled anisotropic mock	High	`tests/test_recovery_clustering.py`	\(\|\hat\gamma - \gamma_\mathrm{true}\| < 0.15\) on scales 0.1–10 Mpc
WTHETA physical recovery — recover w(θ) from a Poisson-sampled angular mock	High	`tests/test_recovery_clustering.py`	Power-law slope within 0.15 of truth
DeltaSigma physical recovery — recover ΔΣ(r_p) from a synthetic NFW profile with known mass	High	`tests/test_recovery_lensing.py`	Recovered M_200c within 30% of truth
SMF vs COSMOS literature — automate comparison against Ilbert+ (2013) COSMOS2015 SMF in ≥ 2 redshift bins	Medium	`tests/test_literature_smf.py` or `docs/scripts/`	Δφ/φ < 30% in populated bins (accounting for cosmic variance)
WPRP vs GAMA literature — automate comparison against Farrow+ (2015) GAMA w_p	Medium	`tests/test_literature_wprp.py`	Δw_p/w_p < 50% (field-variance dominated)
Jackknife covariance quality — verify JK covariance is PSD and condition number is < threshold	Low	`tests/test_covariance.py`	All eigenvalues > 0; condition number < 10⁴

Stub test files for the high-priority items are at tests/test_recovery_clustering.py and tests/test_recovery_lensing.py. Each test is marked @pytest.mark.skip(reason="not yet implemented") so it appears in the test output as a reminder without blocking the suite.

Adding new tests 

When adding a new estimator or modifying an existing one, please add at minimum:

A unit test (correct output shape and sign, expected exceptions).
A physical recovery test if the estimator is a statistical estimator (verify on a mock with known truth).

The _make_schechter_lf_cat and _make_double_schechter_smf_cat helpers in tests/test_recovery.py can be reused to generate controlled mock catalogues.

Use numpy.random.default_rng(seed) with a fixed seed so tests are reproducible.