.. _testing: Test suite and validation ========================== This page documents the test philosophy, per-module test descriptions, and how to run the full suite. The tests are organised in three tiers of increasing physical rigor. .. contents:: Contents :depth: 2 :local: ---- Test philosophy ---------------- A cosmological estimator can pass all unit tests and still be systematically wrong. For example, a sign error in the distance-modulus formula would not be caught by a "check that phi > 0" test — but it would produce a LF shifted by several magnitudes from the truth. For this reason the test suite is structured in three tiers: .. list-table:: Test tiers :header-rows: 1 :widths: 20 30 50 * - Tier - Purpose - How to spot a failure * - **Unit** - Correct output shapes, signs, and error conditions - Shape mismatch, negative phi, wrong exception * - **Consistency** - Independent estimators agree on the same mock data - Ratio :math:`\phi_\mathrm{SWML} / \phi_{1/V_\mathrm{max}}` outside [0.3, 3] * - **Physical recovery** - Estimator returns the *correct* answer on known input - Recovered :math:`\alpha` or :math:`M^*` deviates by more than :math:`\sigma` from truth Most existing tests (``test_lf_smf.py``, ``test_twopcf_*.py``) are in tiers 1–2. ``test_recovery.py`` and ``test_cosmology_validation.py`` add tier-3 coverage. ---- Running the tests ------------------ .. code-block:: bash # Activate the environment first conda activate sum_stat # Run all tests pytest tests/ -v # Run only the physical recovery tests (slow — generates mock catalogues) pytest tests/test_recovery.py -v -s # Run only the cosmology validation tests pytest tests/test_cosmology_validation.py -v -s # Run the timing benchmarks pytest tests/test_benchmarks.py -v -s Expected total time: ~3–5 minutes (recovery tests generate mock catalogues via ``z_at_value``, which is the bottleneck). ---- Per-module test summaries -------------------------- Catalogue (``test_catalogue.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 35 65 * - Test - What it checks * - ``TestGalaxyCatalogue`` - Constructor, shape validation, default weights, optional fields * - ``TestShapeCatalogue`` - Ellipticity shape checks, weight normalisation * - ``TestPhotoZCalibTable`` - Photo-z calibration table construction Luminosity & stellar mass functions (``test_lf_smf.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 35 65 * - Test class - What it checks * - ``TestVmaxLF`` - Output shapes, non-negativity, error ≤ phi, missing abs_mag raises ValueError * - ``TestVmaxSMF`` - Same for stellar mass; missing log10_mstar raises ValueError * - ``TestSWML`` - Convergence, shape normalisation, absolute normalisation matches Vmax * - ``TestCMinus`` - Cumulative LF is monotone non-decreasing and positive * - ``TestMCComparison`` - Cross-estimator consistency (Vmax vs SWML ratio within [0.5, 2]); tau|<5 for no-evolution sample * - ``TestSchechterMass`` - Positivity, peak near :math:`M^*`, JAX JIT/grad compatible * - ``TestDoubleSchechterMass`` - Positivity, single-component limit, JAX JIT/grad compatible * - ``TestEddingtonKernel`` - Positive, symmetric, peak at :math:`\Delta m = 0` * - ``TestConvolveSmfEddington`` - Broadens the SMF (higher dispersion after convolution) Physical recovery (``test_recovery.py``) — **tier 3** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These tests generate mock catalogues from a known Schechter function and verify that the estimators recover the truth within physically motivated tolerances. .. list-table:: :header-rows: 1 :widths: 40 60 * - Test - What it verifies * - ``TestVmaxRecovery::test_recovers_schechter_normalization`` - Integrated :math:`1/V_\mathrm{max}` LF is positive and finite; peak within 2.5 mag of :math:`M^*` * - ``TestVmaxRecovery::test_shape_matches_schechter_slope`` - Schechter fit to Vmax bins: :math:`|\hat\alpha - \alpha_\mathrm{true}| < 0.4`, :math:`|\hat M^* - M^*_\mathrm{true}| < 0.8` mag * - ``TestSWMLRecovery::test_swml_recovers_schechter_shape`` - SWML/Vmax bin ratio within [0.1, 10] in populated bins * - ``TestSWMLRecovery::test_vmax_swml_integrated_density_consistent`` - Integrated densities agree within factor 3 * - ``TestCminusRecovery::test_cminus_cumulative_positive_and_finite`` - :math:`C^-` LF is positive, finite, non-decreasing * - ``TestCminusRecovery::test_cminus_total_density_order_of_magnitude`` - Total density in range :math:`10^{-6}`–:math:`1` Mpc\ :sup:`-3` * - ``TestSMFVmaxRecovery::test_smf_vmax_integrated_density`` - Integrated SMF is positive and finite * - ``TestSMFVmaxRecovery::test_smf_vmax_peak_near_mstar`` - SMF peak within 1.5 dex of true :math:`\log M^*` * - ``TestVmaxTruncation::test_individual_zmax_truncation_does_not_bias_normalization`` - ``z_max_individual = z_max`` gives numerically identical result to default * - ``TestVmaxTruncation::test_lower_zmax_individual_increases_normalization`` - Restricting :math:`z_{\max,i}` increases :math:`\phi` (smaller :math:`V_\mathrm{max}`) Cosmology validation (``test_cosmology_validation.py``) — **tier 3** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 40 60 * - Test - What it verifies * - ``TestEinsteinDeSitter::test_comoving_distance_eds`` - JAX :math:`\chi(z)` agrees with analytic EdS solution within 0.1% at :math:`z \in \{0.1, 0.5, 1, 2\}` * - ``TestEinsteinDeSitter::test_comoving_distance_array_eds`` - Array and scalar paths give identical results * - ``TestPlanck18VsAstropy::test_comoving_distance_matches_astropy`` - :math:`|\chi_\mathrm{JAX}(z) - \chi_\mathrm{astropy}(z)| < 0.5` Mpc at 20 redshifts * - ``TestPlanck18VsAstropy::test_comoving_volume_matches_astropy`` - :math:`V_c` agrees within 0.1% for :math:`z > 0.1` * - ``TestPlanck18VsAstropy::test_angular_diameter_distance_vs_astropy`` - :math:`D_A` agrees within 1% * - ``TestPlanck18VsAstropy::test_astropy_to_jax_cosmo_extractor`` - :math:`h` and :math:`\Omega_m` extracted correctly * - ``TestComovingVolumeProperties::test_comoving_volume_monotone`` - :math:`V_c(z)` strictly increasing from :math:`z = 0.01` to :math:`z = 5` * - ``TestComovingVolumeProperties::test_comoving_volume_zero_at_zero`` - :math:`V_c(z \approx 0) \approx 0` * - ``TestComovingVolumeProperties::test_comoving_distance_positive`` - :math:`\chi(z) > 0` for all :math:`z > 0` * - ``TestVmaxConsistency::test_vmax_sum_equals_number_density`` - For a volume-limited sample: :math:`\sum 1/V_\mathrm{max} \approx N / V_\mathrm{survey}` within 1% Two-point correlation functions (``test_twopcf_*.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 40 60 * - Test - What it checks * - ``TestLandySzalayJax`` - :math:`\hat w(\theta) \ge -1`; correct shape; output shape matches bins * - ``TestDavisPeeblesJax`` - :math:`\hat w(\theta) \ge -1` * - ``TestWThetaFromPairCounts`` - Aggregated pair counts give consistent LS estimate * - ``TestWp`` - Projected :math:`w_p(r_p) > 0`; output shape; units [Mpc] * - ``TestLegendreDecompose`` - Monopole consistent with :math:`w_p`; quadrupole sign Covariance (``test_covariance.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 40 60 * - Test - What it checks * - ``TestAssignJackknifeRegions`` - Correct number of regions; all objects assigned * - ``TestJackknifeFromSubsamples`` - Covariance matrix shape; positive diagonal; scales as :math:`(N_\mathrm{jk}-1)/N_\mathrm{jk}` * - ``TestBootstrapCovariance`` - Positive diagonal; consistent normalisation Lensing (``test_lensing_esd.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 40 60 * - Test - What it checks * - ``TestShearCalib`` - :math:`\Sigma_\mathrm{crit}` is positive, finite, and increases with lens redshift N(z) estimation (``test_nz.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 40 60 * - Test - What it checks * - ``TestNzHistogram`` - Output shape, non-negativity, normalisation to unity * - ``TestNzKde`` - KDE output non-negative; integrates to approximately 1 ---- Benchmark timing (``test_benchmarks.py``) ------------------------------------------- These tests record that the core routines meet timing targets on a standard CPU. They are **not** run by default (the ``-s`` flag is needed to see timing output). .. list-table:: :header-rows: 1 :widths: 45 30 25 * - Benchmark - Input size - Target * - Comoving distance (JAX) - 1 000 redshifts - < 10 ms (after JIT warmup) * - :math:`w(\theta)` Landy-Szalay - 10 000 galaxies, 50 000 randoms - < 30 s * - :math:`w_p(r_p)` projected - 5 000 galaxies, 25 000 randoms - < 60 s * - Jackknife covariance - :math:`N_\mathrm{jk} = 100` sub-surveys - < 5 s ---- .. _test-coverage-gaps: Test coverage gaps (pre-release) ---------------------------------- The table below lists the tier-3 physical recovery and literature comparison tests that are missing before the package can be considered production-ready for its four stable estimators. .. list-table:: :header-rows: 1 :widths: 30 12 30 28 * - Missing test - Priority - Target file - Pass criterion * - **WPRP physical recovery** — recover a power-law w\ :sub:`p`\ (r\ :sub:`p`) γ from a Poisson-sampled anisotropic mock - **High** - ``tests/test_recovery_clustering.py`` - :math:`|\hat\gamma - \gamma_\mathrm{true}| < 0.15` on scales 0.1–10 Mpc * - **WTHETA physical recovery** — recover w(θ) from a Poisson-sampled angular mock - **High** - ``tests/test_recovery_clustering.py`` - Power-law slope within 0.15 of truth * - **DeltaSigma physical recovery** — recover ΔΣ(r\ :sub:`p`) from a synthetic NFW profile with known mass - **High** - ``tests/test_recovery_lensing.py`` - Recovered M\ :sub:`200c` within 30% of truth * - **SMF vs COSMOS literature** — automate comparison against Ilbert+ (2013) COSMOS2015 SMF in ≥ 2 redshift bins - Medium - ``tests/test_literature_smf.py`` or ``docs/scripts/`` - Δφ/φ < 30% in populated bins (accounting for cosmic variance) * - **WPRP vs GAMA literature** — automate comparison against Farrow+ (2015) GAMA w\ :sub:`p` - Medium - ``tests/test_literature_wprp.py`` - Δw\ :sub:`p`/w\ :sub:`p` < 50% (field-variance dominated) * - **Jackknife covariance quality** — verify JK covariance is PSD and condition number is < threshold - Low - ``tests/test_covariance.py`` - All eigenvalues > 0; condition number < 10⁴ Stub test files for the high-priority items are at ``tests/test_recovery_clustering.py`` and ``tests/test_recovery_lensing.py``. Each test is marked ``@pytest.mark.skip(reason="not yet implemented")`` so it appears in the test output as a reminder without blocking the suite. ---- Adding new tests ----------------- When adding a new estimator or modifying an existing one, please add at minimum: 1. A **unit test** (correct output shape and sign, expected exceptions). 2. A **physical recovery test** if the estimator is a statistical estimator (verify on a mock with known truth). The ``_make_schechter_lf_cat`` and ``_make_double_schechter_smf_cat`` helpers in ``tests/test_recovery.py`` can be reused to generate controlled mock catalogues. Use ``numpy.random.default_rng(seed)`` with a fixed seed so tests are reproducible.