Overview
========
``sum_stat`` measures summary statistics from galaxy and weak-lensing catalogues: luminosity function, stellar mass function, completeness curves, clustering and galaxy-galaxy lensing.
It leverages `Corrfunc `__ and
`dsigma `__ for clustering and lensing
summary statistics, respectively.
It writes self-describing HDF5 files, and is consumed by the companion **gga_model** package for parametric fitting.
Galaxy-sample weights (from imaging systematics) produced by the companion
`sys_mapping `__ package can be passed.
.. contents:: Contents
:local:
:depth: 1
Estimator maturity
------------------
The table below shows which estimators are **production-ready** and which are
still in active development. Only the four stable estimators should be used
for scientific analysis; the others may change without notice.
.. list-table::
:widths: 20 30 12 38
:header-rows: 1
* - Estimator
- Module
- Status
- Notes
* - SMF φ(M★)
- :mod:`sum_stat.lf_smf`
- **Stable**
- Validated against COSMOS, GAMA, and Uchuu mocks
* - w\ :sub:`p`\ (r\ :sub:`p`)
- :mod:`sum_stat.twopcf`
- **Stable**
- Validated against Uchuu mocks and BGS
* - w(θ)
- :mod:`sum_stat.twopcf`
- **Stable**
- Validated against Uchuu mocks and BGS
* - ΔΣ(r\ :sub:`p`)
- :mod:`sum_stat.lensing`
- **Stable**
- Validated on HSC Y3 / DES Y3 / KiDS sources
* - LF φ(M)
- :mod:`sum_stat.lf_smf`
- *Development*
- Same estimators as SMF; not yet validated against reference datasets
* - ξ(r), ξ\ :sub:`ℓ`\ (s)
- :mod:`sum_stat.twopcf`
- *Development*
- API and output format may change
* - C\ :sub:`ℓ`, P\ :sub:`ℓ`\ (k)
- :mod:`sum_stat.powspec`
- *Development*
- Not yet validated against literature
* - kNN-CDF
- :mod:`sum_stat.knn`
- *Development*
- Not yet validated against simulations
* - n(z)
- :mod:`sum_stat.nz`
- *Development*
- Utility; no validation suite yet
Summary statistics
------------------
A *summary statistic* compresses a galaxy survey into a low-dimensional quantity
that retains the cosmological information relevant to a given test.
The page :ref:`theory` describes how each estimator functions.
One-point statistics — density functions (stable: SMF)
------------------------------------------------------
One-point function estimators are in :mod:`sum_stat.lf_smf`.
The **stellar mass function (SMF)** estimators below are production-ready.
The luminosity function (LF) estimators use the same code but are listed
separately under :ref:`statistics-in-development` because they lack a
completed validation suite.
.. list-table::
:widths: 26 20 22 18 14
:header-rows: 1
* - Statistic
- Function
- Backend · Reference
- Required inputs
- CPU time (200k / 1M / 3M gal)
* - SMF φ(M\ :sub:`★`) — 1/Vmax
- :func:`~sum_stat.stellar_mass_function_vmax`
- JAX; `Schmidt (1968) `__
- cat: z, m\ :sub:`★`, w; area\ :sub:`sr`, cosmo
- < 1 s / 1 s / 3 s
* - SMF φ(M\ :sub:`★`) — SWML
- :func:`~sum_stat.lf_smf.swml.stellar_mass_function_swml`
- iterative; `Efstathiou et al. (1988) `__
- cat: z, m\ :sub:`★`, w; m\ :sub:`★,lim`, area\ :sub:`sr`, cosmo
- < 1 s / 1 s / ~3 s
* - SMF Φ(>M\ :sub:`★`) — C\ :sup:`−` cumulative
- :func:`~sum_stat.lf_smf.cminus.cumulative_stellar_mass_function_cminus`
- product O(N²); `Lynden-Bell (1971) `__
- cat: z, m\ :sub:`★`, w; m\ :sub:`★,lim`, area\ :sub:`sr`, cosmo
- 40 s / ~15 min / ~2 h
* - M★–z independence τ
- :func:`~sum_stat.lf_smf.independence.efron_petrosian_tau`
- rank O(N²); `Efron & Petrosian (1992) `__
- cat: z, m\ :sub:`★`; m\ :sub:`★,lim`, cosmo
- ~2 min / ~50 min / ~8 h
* - M★–z independence T\ :sub:`c`/T\ :sub:`v`
- :func:`~sum_stat.lf_smf.independence.rauzy_completeness`
- rank O(N²); `Rauzy (2001) `__
- cat: z, m\ :sub:`★`; m\ :sub:`★,lim`, cosmo
- ~2 min / ~50 min / ~8 h
* - Completeness scan T\ :sub:`c`\ (m★), T\ :sub:`v`\ (m★)
- :func:`~sum_stat.completeness_scan`
- rank O(N² × n_test);
`Johnston+ (2007) `__,
`Teodoro+ (2010) `__,
`Johnston+ (2012) `__
- m\ :sub:`app`, z, m\ :sub:`★`; mag\ :sub:`lim`, cosmo; optional bright limit & S/N target
- ~2 min / ~50 min / ~8 h (per scan point)
**Input notation** — ``cat:`` fields are attributes of
:class:`~sum_stat.GalaxyCatalogue` that must be set; the remaining items are
function arguments. z = ``redshift``; M = ``abs_mag``; m\ :sub:`★` =
``log10_mstar``; w = ``weight``; area\ :sub:`sr` = survey solid angle [sr];
mag\ :sub:`lim` = apparent-magnitude limit; m\ :sub:`★,lim` = per-galaxy
mass completeness limit [log₁₀(M\ :sub:`★`/M\ :sub:`☉`)].
Two-point statistics (stable: WPRP, WTHETA, DeltaSigma)
---------------------------------------------------------
Pair-counting assumes N_rand = 5 × N_gal. ΔΣ timing is source-catalogue dominated
(≈ 35 M for HSC Y3, ≈ 100 M for DES Y3).
.. list-table::
:widths: 24 20 24 18 14
:header-rows: 1
* - Statistic
- Function
- Backend · Reference
- Required inputs
- CPU time (200k / 1M / 3M gal)
* - Angular w(θ)
- :func:`~sum_stat.w_theta`
- treecorr LS; `Landy & Szalay (1993) `__,
`Jarvis et al. (2004) `__
- cat: α, δ (gal + rand)
- 8 s / 40 s / ~2 min
* - Projected w\ :sub:`p`\ (r\ :sub:`p`)
- :func:`~sum_stat.wp`
- Corrfunc ``DDrppi_mocks`` + LS; `Davis & Peebles (1983) `__,
`Sinha & Garrison (2020) `__
- cat: α, δ, z (gal + rand); π\ :sub:`max`, cosmo
- ~3 min / ~1 h / ~10 h
* - ESD ΔΣ(r\ :sub:`p`)
- :func:`~sum_stat.delta_sigma`
- dsigma; `Sheldon et al. (2004) `__,
`Mandelbaum et al. (2005) `__,
`Lange & Huang (2022) `__
- cat: α, δ, z; shape cat, cosmo
- hours (source-cat dominated)
**Input notation** — α, δ = ``ra``, ``dec`` [deg]; z = ``redshift``;
shape cat = :class:`~sum_stat.ShapeCatalogue` (HSC Y3 or DES Y3).
.. _statistics-in-development:
Statistics in development
--------------------------
.. warning::
The following statistics are **not yet production-ready**. APIs and output
formats may change without notice and results have not been validated against
reference data. They are disabled by default in the pipeline scripts
(``COMPUTE_STATS["multipoles"] = False``, ``COMPUTE_STATS["cl"] = False``).
Do not use them for scientific analysis. See :doc:`development_estimators`
for the detailed status and roadmap.
kNN CPU times assume pyfnntw (Rust parallel kd-tree) on 16 cores; query points
are drawn from the randoms catalogue.
.. list-table::
:widths: 24 20 24 18 14
:header-rows: 1
* - Statistic
- Function
- Backend · Reference
- Required inputs
- CPU time (200k / 1M / 3M gal)
* - LF φ(M) — 1/Vmax
- :func:`~sum_stat.luminosity_function_vmax`
- JAX; `Schmidt (1968) `__
- cat: z, M, w; area\ :sub:`sr`, cosmo
- < 1 s / 1 s / 3 s
* - LF φ(M) — SWML
- :func:`~sum_stat.lf_smf.swml.luminosity_function_swml`
- iterative; `Efstathiou et al. (1988) `__
- cat: z, M, w; mag\ :sub:`lim`, area\ :sub:`sr`, cosmo
- < 1 s / 1 s / ~3 s
* - LF Φ(`__
- cat: z, M, w; mag\ :sub:`lim`, area\ :sub:`sr`, cosmo
- 40 s / ~15 min / ~2 h
* - 3D ξ(r)
- :func:`~sum_stat.xi_r`
- treecorr LS; `Landy & Szalay (1993) `__,
`Jarvis et al. (2004) `__
- cat: α, δ, z (gal + rand); cosmo
- ~8 s / ~40 s / ~2 min
* - Multipoles ξ\ :sub:`ℓ`\ (s)
- :func:`~sum_stat.xi_multipoles`
- treecorr + JAX Legendre; `Hamilton (1992) `__
- cat: α, δ, z (gal + rand); cosmo
- ~8 s / ~40 s / ~2 min
* - Angular C\ :sub:`ℓ`
- :func:`~sum_stat.cl_angular`
- healpy pseudo-C\ :sub:`ℓ`; `Hivon et al. (2002) `__,
`Górski et al. (2005) `__
- cat: α, δ; nside, ℓ-bins
- 4 s / 4 s / 6 s
* - 3D P\ :sub:`ℓ`\ (k)
- :func:`~sum_stat.pk3d`, :func:`~sum_stat.pk_multipoles`
- JAX FFT; `Feldman et al. (1994) `__
- cat: α, δ, z; cosmo
- < 1 s / ~1 s / ~3 s
* - kNN-CDF F\ :sub:`k`\ (r)
- :func:`~sum_stat.knn_cdf`
- pyfnntw + JAX;
`Banerjee & Abel (2021) `__
- cat: z, α, δ; rand; k, r\_bins, cosmo
- < 10 s / < 1 min / < 5 min
* - Cross-kNN F\ :sub:`k`\ :sup:`A,B`\ (r)
- :func:`~sum_stat.cross_knn_cdf`
- pyfnntw + JAX;
`Banerjee & Abel (2021) `__
- cat A & B: z, α, δ; rand; k, r\_bins, cosmo
- < 20 s / < 2 min / < 10 min
* - kNN volume map V\ :sub:`k`\ (x)
- :func:`~sum_stat.knn_volume_map`
- pyfnntw;
`Banerjee & Abel (2021) `__
- cat: z, α, δ; query grid xyz; k, cosmo
- depends on grid size
See :doc:`api/knn` for the kNN mathematical definition, HDF5 schema, and
paper references.
JAX design
----------
Selected estimators (1/Vmax sums, Legendre decomposition, power-spectrum
post-processing) are implemented as JAX-native ``@jax.jit`` kernels, which
enables:
* **Automatic differentiation** — ``jax.grad`` and ``jax.hessian`` work
through JAX kernels, enabling Fisher matrices and MCMC gradients in
downstream modelling.
* **Vectorisation** — ``jax.vmap`` is available for batching over
parameter grids.
The heavy pair-counting backends (Corrfunc C++, treecorr C++, dsigma Cython)
and the iterative estimators (SWML, C⁻) are NumPy/SciPy based and are called
outside JAX boundaries.
Unit conventions
----------------
All outputs use **physical (h-free) Mpc** by default. This follows the astropy
convention: :func:`~astropy.cosmology.FLRW.comoving_distance` and
:func:`~astropy.cosmology.FLRW.comoving_volume` return Mpc and Mpc³ at face
value with no implicit h factor.
.. list-table::
:widths: 36 20 44
:header-rows: 1
* - Quantity
- Unit
- Notes
* - Comoving distances
- Mpc
- physical, not h⁻¹ Mpc
* - Comoving volumes (V\ :sub:`max`)
- Mpc³
- physical, not h⁻³ Mpc³
* - Luminosity function φ(M)
- Mpc⁻³ mag⁻¹
- physical, not h³ Mpc⁻³ mag⁻¹
* - Stellar mass function φ(M\ :sub:`★`)
- Mpc⁻³ dex⁻¹
- physical, not h³ Mpc⁻³ dex⁻¹
* - ESD ΔΣ(r\ :sub:`p`)
- M\ :sub:`☉` pc⁻²
- h-independent
* - Angular 2PCF w(θ)
- dimensionless
- h-independent
* - 3D 2PCF ξ(r), multipoles ξ\ :sub:`ℓ`\ (s), w\ :sub:`p`\ (r\ :sub:`p`)
- Mpc (separation), dimensionless (amplitude)
- physical Mpc
* - Wavenumber k (power spectrum only)
- **h/Mpc**
- sole exception: follows Fourier simulation convention
* - Power spectrum P(k)
- **(Mpc/h)³**
- sole exception; matches nbody/simulation convention
The value h = H₀/100 is stored in every HDF5 output file alongside H₀ under
the ``cosmology/h`` dataset so that the power-spectrum axes can be read
without arithmetic.
Script organisation
-------------------
Production measurements use standalone Python scripts in ``scripts/``.
See :doc:`running` for commands.
``measure_joint_sumstat.py`` is the **unified entry point** for all joint
measurements.
.. list-table::
:widths: 46 10 44
:header-rows: 1
* - Script
- Surveys
- Purpose
* - ``scripts/measure_joint_sumstat.py``
- bgs · mock · custom
- **Unified joint measurement**: any combination of SMF, LF, WP,
ESD_HSC, ESD_DES, ESD_KIDS, WTHETA, KNN; configurable bins;
jackknife or bootstrap; crash-recovery checkpoints
* - ``scripts/plot_joint_sumstat.py``
- bgs · mock · custom
- Diagnostic plots for the HDF5 output of ``measure_joint_sumstat.py``
Output format
-------------
The unified script ``measure_joint_sumstat.py`` writes all active statistics
into a single HDF5 file per sample × weight variant. Bin edges are fully
configurable via ``--bins-*`` (see :ref:`bin-spec`). Weight variants produce
separate files; the variant suffix is appended to the filename
(e.g. ``…-sys-comb.h5``; ``uniform`` has no suffix).
HDF5 schema (``measure_joint_sumstat.py``)::
{sample_id}_joint_{stats_slug}[-{weight}].h5
├── smf/{sample_id}/
│ ├── cosmology/ H0, h, Om0, Ob0, Ok0
│ ├── bin_edges [dex]
│ ├── log10mstar_centres [dex]
│ ├── phi [Mpc⁻³ dex⁻¹]
│ ├── phi_err [Mpc⁻³ dex⁻¹]
│ └── cov [Mpc⁻⁶ dex⁻²]
├── lf/{sample_id}/
│ ├── M_centres [mag]
│ ├── phi [Mpc⁻³ mag⁻¹]
│ ├── phi_err [Mpc⁻³ mag⁻¹]
│ └── cov
├── twopcf/wp_{sample_id}/
│ ├── sep_centres [Mpc]
│ ├── xi [Mpc]
│ └── cov
├── twopcf/wtheta_{sample_id}/
│ ├── sep_centres [arcmin]
│ ├── xi [dimensionless]
│ └── cov
├── esd/{sample_id}_{HSC|DES|KIDS}/
│ ├── rp_centres [Mpc]
│ ├── delta_sigma [M_sun/pc²]
│ └── cov
├── knn/{sample_id}/
│ ├── r_centres [Mpc]
│ ├── k_values [dimensionless]
│ ├── F_k [dimensionless] shape (n_k, n_r)
│ └── F_k_poisson [dimensionless] Erlang reference
├── joint_covariance/
│ ├── data_vector concatenated statistic vector
│ ├── err sqrt(diag(cov))
│ ├── cov full joint matrix
│ ├── subsamples JK or bootstrap draws [N_iters × N_bins]
│ └── attrs: statistics, slice_*, covariance_method
└── _checkpoint/
└── subsamples in-progress iterations (NaN = not yet done)
Every dataset carries ``unit`` and ``description`` attributes.
The file-level ``git_hash`` attribute records the exact code revision.
See :doc:`data_format` for the complete attribute table.
.. _bin-spec:
Bin specification
-----------------
All ``--bins-*`` flags in ``measure_joint_sumstat.py`` accept one of three
formats:
.. list-table::
:widths: 35 65
:header-rows: 1
* - Format
- Result
* - ``log:N:MIN:MAX``
- ``np.logspace(log10(MIN), log10(MAX), N+1)`` — N log-spaced bins
* - ``lin:N:MIN:MAX``
- ``np.linspace(MIN, MAX, N+1)`` — N linearly spaced bins
* - ``arange:START:STOP:STEP``
- ``np.arange(START, STOP, STEP)`` — uniform spacing
Examples::
--bins-rp log:15:0.1:40 # 15 log bins, 0.1–40 Mpc
--bins-mstar arange:10.5:12.6:0.1 # 0.1 dex steps from 10.5 to 12.5
--bins-theta log:20:0.05:100 # 20 log bins, 0.05–100 arcmin
--bins-knn-r log:10:1:30 # 10 log bins, 1–30 Mpc
--knn-k 1 2 3 # k = 1, 2, 3 neighbours
.. seealso::
A full reference list with annotated entries and ADS links is available
at :doc:`bibliography`.