.. _testing:

Test suite and validation
==========================

This page documents the test philosophy, per-module test descriptions, and
how to run the full suite.  The tests are organised in three tiers of increasing
physical rigor.

.. contents:: Contents
   :depth: 2
   :local:

----

Test philosophy
----------------

A cosmological estimator can pass all unit tests and still be systematically
wrong.  For example, a sign error in the distance-modulus formula would not
be caught by a "check that phi > 0" test — but it would produce a LF shifted
by several magnitudes from the truth.

For this reason the test suite is structured in three tiers:

.. list-table:: Test tiers
   :header-rows: 1
   :widths: 20 30 50

   * - Tier
     - Purpose
     - How to spot a failure
   * - **Unit**
     - Correct output shapes, signs, and error conditions
     - Shape mismatch, negative phi, wrong exception
   * - **Consistency**
     - Independent estimators agree on the same mock data
     - Ratio :math:`\phi_\mathrm{SWML} / \phi_{1/V_\mathrm{max}}` outside [0.3, 3]
   * - **Physical recovery**
     - Estimator returns the *correct* answer on known input
     - Recovered :math:`\alpha` or :math:`M^*` deviates by more than :math:`\sigma` from truth

Most existing tests (``test_lf_smf.py``, ``test_twopcf_*.py``) are in tiers 1–2.
``test_recovery.py`` and ``test_cosmology_validation.py`` add tier-3 coverage.

----

Running the tests
------------------

.. code-block:: bash

   # Activate the environment first
   conda activate sum_stat

   # Run all tests
   pytest tests/ -v

   # Run only the physical recovery tests (slow — generates mock catalogues)
   pytest tests/test_recovery.py -v -s

   # Run only the cosmology validation tests
   pytest tests/test_cosmology_validation.py -v -s

   # Run the timing benchmarks
   pytest tests/test_benchmarks.py -v -s

Expected total time: ~3–5 minutes (recovery tests generate mock catalogues via
``z_at_value``, which is the bottleneck).

----

Per-module test summaries
--------------------------

Catalogue (``test_catalogue.py``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 35 65

   * - Test
     - What it checks
   * - ``TestGalaxyCatalogue``
     - Constructor, shape validation, default weights, optional fields
   * - ``TestShapeCatalogue``
     - Ellipticity shape checks, weight normalisation
   * - ``TestPhotoZCalibTable``
     - Photo-z calibration table construction

Luminosity & stellar mass functions (``test_lf_smf.py``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 35 65

   * - Test class
     - What it checks
   * - ``TestVmaxLF``
     - Output shapes, non-negativity, error ≤ phi, missing abs_mag raises ValueError
   * - ``TestVmaxSMF``
     - Same for stellar mass; missing log10_mstar raises ValueError
   * - ``TestSWML``
     - Convergence, shape normalisation, absolute normalisation matches Vmax
   * - ``TestCMinus``
     - Cumulative LF is monotone non-decreasing and positive
   * - ``TestMCComparison``
     - Cross-estimator consistency (Vmax vs SWML ratio within [0.5, 2]); tau|<5 for no-evolution sample
   * - ``TestSchechterMass``
     - Positivity, peak near :math:`M^*`, JAX JIT/grad compatible
   * - ``TestDoubleSchechterMass``
     - Positivity, single-component limit, JAX JIT/grad compatible
   * - ``TestEddingtonKernel``
     - Positive, symmetric, peak at :math:`\Delta m = 0`
   * - ``TestConvolveSmfEddington``
     - Broadens the SMF (higher dispersion after convolution)

Physical recovery (``test_recovery.py``) — **tier 3**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These tests generate mock catalogues from a known Schechter function and verify
that the estimators recover the truth within physically motivated tolerances.

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Test
     - What it verifies
   * - ``TestVmaxRecovery::test_recovers_schechter_normalization``
     - Integrated :math:`1/V_\mathrm{max}` LF is positive and finite; peak within 2.5 mag of :math:`M^*`
   * - ``TestVmaxRecovery::test_shape_matches_schechter_slope``
     - Schechter fit to Vmax bins: :math:`|\hat\alpha - \alpha_\mathrm{true}| < 0.4`, :math:`|\hat M^* - M^*_\mathrm{true}| < 0.8` mag
   * - ``TestSWMLRecovery::test_swml_recovers_schechter_shape``
     - SWML/Vmax bin ratio within [0.1, 10] in populated bins
   * - ``TestSWMLRecovery::test_vmax_swml_integrated_density_consistent``
     - Integrated densities agree within factor 3
   * - ``TestCminusRecovery::test_cminus_cumulative_positive_and_finite``
     - :math:`C^-` LF is positive, finite, non-decreasing
   * - ``TestCminusRecovery::test_cminus_total_density_order_of_magnitude``
     - Total density in range :math:`10^{-6}`–:math:`1` Mpc\ :sup:`-3`
   * - ``TestSMFVmaxRecovery::test_smf_vmax_integrated_density``
     - Integrated SMF is positive and finite
   * - ``TestSMFVmaxRecovery::test_smf_vmax_peak_near_mstar``
     - SMF peak within 1.5 dex of true :math:`\log M^*`
   * - ``TestVmaxTruncation::test_individual_zmax_truncation_does_not_bias_normalization``
     - ``z_max_individual = z_max`` gives numerically identical result to default
   * - ``TestVmaxTruncation::test_lower_zmax_individual_increases_normalization``
     - Restricting :math:`z_{\max,i}` increases :math:`\phi` (smaller :math:`V_\mathrm{max}`)

Cosmology validation (``test_cosmology_validation.py``) — **tier 3**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Test
     - What it verifies
   * - ``TestEinsteinDeSitter::test_comoving_distance_eds``
     - JAX :math:`\chi(z)` agrees with analytic EdS solution within 0.1% at :math:`z \in \{0.1, 0.5, 1, 2\}`
   * - ``TestEinsteinDeSitter::test_comoving_distance_array_eds``
     - Array and scalar paths give identical results
   * - ``TestPlanck18VsAstropy::test_comoving_distance_matches_astropy``
     - :math:`|\chi_\mathrm{JAX}(z) - \chi_\mathrm{astropy}(z)| < 0.5` Mpc at 20 redshifts
   * - ``TestPlanck18VsAstropy::test_comoving_volume_matches_astropy``
     - :math:`V_c` agrees within 0.1% for :math:`z > 0.1`
   * - ``TestPlanck18VsAstropy::test_angular_diameter_distance_vs_astropy``
     - :math:`D_A` agrees within 1%
   * - ``TestPlanck18VsAstropy::test_astropy_to_jax_cosmo_extractor``
     - :math:`h` and :math:`\Omega_m` extracted correctly
   * - ``TestComovingVolumeProperties::test_comoving_volume_monotone``
     - :math:`V_c(z)` strictly increasing from :math:`z = 0.01` to :math:`z = 5`
   * - ``TestComovingVolumeProperties::test_comoving_volume_zero_at_zero``
     - :math:`V_c(z \approx 0) \approx 0`
   * - ``TestComovingVolumeProperties::test_comoving_distance_positive``
     - :math:`\chi(z) > 0` for all :math:`z > 0`
   * - ``TestVmaxConsistency::test_vmax_sum_equals_number_density``
     - For a volume-limited sample: :math:`\sum 1/V_\mathrm{max} \approx N / V_\mathrm{survey}` within 1%

Two-point correlation functions (``test_twopcf_*.py``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Test
     - What it checks
   * - ``TestLandySzalayJax``
     - :math:`\hat w(\theta) \ge -1`; correct shape; output shape matches bins
   * - ``TestDavisPeeblesJax``
     - :math:`\hat w(\theta) \ge -1`
   * - ``TestWThetaFromPairCounts``
     - Aggregated pair counts give consistent LS estimate
   * - ``TestWp``
     - Projected :math:`w_p(r_p) > 0`; output shape; units [Mpc]
   * - ``TestLegendreDecompose``
     - Monopole consistent with :math:`w_p`; quadrupole sign

Covariance (``test_covariance.py``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Test
     - What it checks
   * - ``TestAssignJackknifeRegions``
     - Correct number of regions; all objects assigned
   * - ``TestJackknifeFromSubsamples``
     - Covariance matrix shape; positive diagonal; scales as :math:`(N_\mathrm{jk}-1)/N_\mathrm{jk}`
   * - ``TestBootstrapCovariance``
     - Positive diagonal; consistent normalisation

Lensing (``test_lensing_esd.py``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Test
     - What it checks
   * - ``TestShearCalib``
     - :math:`\Sigma_\mathrm{crit}` is positive, finite, and increases with lens redshift

N(z) estimation (``test_nz.py``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Test
     - What it checks
   * - ``TestNzHistogram``
     - Output shape, non-negativity, normalisation to unity
   * - ``TestNzKde``
     - KDE output non-negative; integrates to approximately 1

----

Benchmark timing (``test_benchmarks.py``)
-------------------------------------------

These tests record that the core routines meet timing targets on a standard CPU.
They are **not** run by default (the ``-s`` flag is needed to see timing output).

.. list-table::
   :header-rows: 1
   :widths: 45 30 25

   * - Benchmark
     - Input size
     - Target
   * - Comoving distance (JAX)
     - 1 000 redshifts
     - < 10 ms (after JIT warmup)
   * - :math:`w(\theta)` Landy-Szalay
     - 10 000 galaxies, 50 000 randoms
     - < 30 s
   * - :math:`w_p(r_p)` projected
     - 5 000 galaxies, 25 000 randoms
     - < 60 s
   * - Jackknife covariance
     - :math:`N_\mathrm{jk} = 100` sub-surveys
     - < 5 s

----

.. _test-coverage-gaps:

Test coverage gaps (pre-release)
----------------------------------

The table below lists the tier-3 physical recovery and literature comparison
tests that are missing before the package can be considered production-ready
for its four stable estimators.

.. list-table::
   :header-rows: 1
   :widths: 30 12 30 28

   * - Missing test
     - Priority
     - Target file
     - Pass criterion
   * - **WPRP physical recovery** — recover a power-law w\ :sub:`p`\ (r\ :sub:`p`) γ
       from a Poisson-sampled anisotropic mock
     - **High**
     - ``tests/test_recovery_clustering.py``
     - :math:`|\hat\gamma - \gamma_\mathrm{true}| < 0.15` on scales 0.1–10 Mpc
   * - **WTHETA physical recovery** — recover w(θ) from a Poisson-sampled angular mock
     - **High**
     - ``tests/test_recovery_clustering.py``
     - Power-law slope within 0.15 of truth
   * - **DeltaSigma physical recovery** — recover ΔΣ(r\ :sub:`p`) from a synthetic NFW
       profile with known mass
     - **High**
     - ``tests/test_recovery_lensing.py``
     - Recovered M\ :sub:`200c` within 30% of truth
   * - **SMF vs COSMOS literature** — automate comparison against Ilbert+ (2013)
       COSMOS2015 SMF in ≥ 2 redshift bins
     - Medium
     - ``tests/test_literature_smf.py`` or ``docs/scripts/``
     - Δφ/φ < 30% in populated bins (accounting for cosmic variance)
   * - **WPRP vs GAMA literature** — automate comparison against Farrow+ (2015)
       GAMA w\ :sub:`p`
     - Medium
     - ``tests/test_literature_wprp.py``
     - Δw\ :sub:`p`/w\ :sub:`p` < 50% (field-variance dominated)
   * - **Jackknife covariance quality** — verify JK covariance is PSD and
       condition number is < threshold
     - Low
     - ``tests/test_covariance.py``
     - All eigenvalues > 0; condition number < 10⁴

Stub test files for the high-priority items are at
``tests/test_recovery_clustering.py`` and ``tests/test_recovery_lensing.py``.
Each test is marked ``@pytest.mark.skip(reason="not yet implemented")`` so it
appears in the test output as a reminder without blocking the suite.

----

Adding new tests
-----------------

When adding a new estimator or modifying an existing one, please add at minimum:

1. A **unit test** (correct output shape and sign, expected exceptions).
2. A **physical recovery test** if the estimator is a statistical estimator
   (verify on a mock with known truth).

The ``_make_schechter_lf_cat`` and ``_make_double_schechter_smf_cat`` helpers in
``tests/test_recovery.py`` can be reused to generate controlled mock catalogues.

Use ``numpy.random.default_rng(seed)`` with a fixed seed so tests are reproducible.