Overview ======== ``sum_stat`` measures summary statistics from galaxy and weak-lensing catalogues: luminosity function, stellar mass function, completeness curves, clustering and galaxy-galaxy lensing. It leverages `Corrfunc `__ and `dsigma `__ for clustering and lensing summary statistics, respectively. It writes self-describing HDF5 files, and is consumed by the companion **gga_model** package for parametric fitting. Galaxy-sample weights (from imaging systematics) produced by the companion `sys_mapping `__ package can be passed. .. contents:: Contents :local: :depth: 1 Estimator maturity ------------------ The table below shows which estimators are **production-ready** and which are still in active development. Only the four stable estimators should be used for scientific analysis; the others may change without notice. .. list-table:: :widths: 20 30 12 38 :header-rows: 1 * - Estimator - Module - Status - Notes * - SMF φ(M★) - :mod:`sum_stat.lf_smf` - **Stable** - Validated against COSMOS, GAMA, and Uchuu mocks * - w\ :sub:`p`\ (r\ :sub:`p`) - :mod:`sum_stat.twopcf` - **Stable** - Validated against Uchuu mocks and BGS * - w(θ) - :mod:`sum_stat.twopcf` - **Stable** - Validated against Uchuu mocks and BGS * - ΔΣ(r\ :sub:`p`) - :mod:`sum_stat.lensing` - **Stable** - Validated on HSC Y3 / DES Y3 / KiDS sources * - LF φ(M) - :mod:`sum_stat.lf_smf` - *Development* - Same estimators as SMF; not yet validated against reference datasets * - ξ(r), ξ\ :sub:`ℓ`\ (s) - :mod:`sum_stat.twopcf` - *Development* - API and output format may change * - C\ :sub:`ℓ`, P\ :sub:`ℓ`\ (k) - :mod:`sum_stat.powspec` - *Development* - Not yet validated against literature * - kNN-CDF - :mod:`sum_stat.knn` - *Development* - Not yet validated against simulations * - n(z) - :mod:`sum_stat.nz` - *Development* - Utility; no validation suite yet Summary statistics ------------------ A *summary statistic* compresses a galaxy survey into a low-dimensional quantity that retains the cosmological information relevant to a given test. The page :ref:`theory` describes how each estimator functions. One-point statistics — density functions (stable: SMF) ------------------------------------------------------ One-point function estimators are in :mod:`sum_stat.lf_smf`. The **stellar mass function (SMF)** estimators below are production-ready. The luminosity function (LF) estimators use the same code but are listed separately under :ref:`statistics-in-development` because they lack a completed validation suite. .. list-table:: :widths: 26 20 22 18 14 :header-rows: 1 * - Statistic - Function - Backend · Reference - Required inputs - CPU time (200k / 1M / 3M gal) * - SMF φ(M\ :sub:`★`) — 1/Vmax - :func:`~sum_stat.stellar_mass_function_vmax` - JAX; `Schmidt (1968) `__ - cat: z, m\ :sub:`★`, w; area\ :sub:`sr`, cosmo - < 1 s / 1 s / 3 s * - SMF φ(M\ :sub:`★`) — SWML - :func:`~sum_stat.lf_smf.swml.stellar_mass_function_swml` - iterative; `Efstathiou et al. (1988) `__ - cat: z, m\ :sub:`★`, w; m\ :sub:`★,lim`, area\ :sub:`sr`, cosmo - < 1 s / 1 s / ~3 s * - SMF Φ(>M\ :sub:`★`) — C\ :sup:`−` cumulative - :func:`~sum_stat.lf_smf.cminus.cumulative_stellar_mass_function_cminus` - product O(N²); `Lynden-Bell (1971) `__ - cat: z, m\ :sub:`★`, w; m\ :sub:`★,lim`, area\ :sub:`sr`, cosmo - 40 s / ~15 min / ~2 h * - M★–z independence τ - :func:`~sum_stat.lf_smf.independence.efron_petrosian_tau` - rank O(N²); `Efron & Petrosian (1992) `__ - cat: z, m\ :sub:`★`; m\ :sub:`★,lim`, cosmo - ~2 min / ~50 min / ~8 h * - M★–z independence T\ :sub:`c`/T\ :sub:`v` - :func:`~sum_stat.lf_smf.independence.rauzy_completeness` - rank O(N²); `Rauzy (2001) `__ - cat: z, m\ :sub:`★`; m\ :sub:`★,lim`, cosmo - ~2 min / ~50 min / ~8 h * - Completeness scan T\ :sub:`c`\ (m★), T\ :sub:`v`\ (m★) - :func:`~sum_stat.completeness_scan` - rank O(N² × n_test); `Johnston+ (2007) `__, `Teodoro+ (2010) `__, `Johnston+ (2012) `__ - m\ :sub:`app`, z, m\ :sub:`★`; mag\ :sub:`lim`, cosmo; optional bright limit & S/N target - ~2 min / ~50 min / ~8 h (per scan point) **Input notation** — ``cat:`` fields are attributes of :class:`~sum_stat.GalaxyCatalogue` that must be set; the remaining items are function arguments. z = ``redshift``; M = ``abs_mag``; m\ :sub:`★` = ``log10_mstar``; w = ``weight``; area\ :sub:`sr` = survey solid angle [sr]; mag\ :sub:`lim` = apparent-magnitude limit; m\ :sub:`★,lim` = per-galaxy mass completeness limit [log₁₀(M\ :sub:`★`/M\ :sub:`☉`)]. Two-point statistics (stable: WPRP, WTHETA, DeltaSigma) --------------------------------------------------------- Pair-counting assumes N_rand = 5 × N_gal. ΔΣ timing is source-catalogue dominated (≈ 35 M for HSC Y3, ≈ 100 M for DES Y3). .. list-table:: :widths: 24 20 24 18 14 :header-rows: 1 * - Statistic - Function - Backend · Reference - Required inputs - CPU time (200k / 1M / 3M gal) * - Angular w(θ) - :func:`~sum_stat.w_theta` - treecorr LS; `Landy & Szalay (1993) `__, `Jarvis et al. (2004) `__ - cat: α, δ (gal + rand) - 8 s / 40 s / ~2 min * - Projected w\ :sub:`p`\ (r\ :sub:`p`) - :func:`~sum_stat.wp` - Corrfunc ``DDrppi_mocks`` + LS; `Davis & Peebles (1983) `__, `Sinha & Garrison (2020) `__ - cat: α, δ, z (gal + rand); π\ :sub:`max`, cosmo - ~3 min / ~1 h / ~10 h * - ESD ΔΣ(r\ :sub:`p`) - :func:`~sum_stat.delta_sigma` - dsigma; `Sheldon et al. (2004) `__, `Mandelbaum et al. (2005) `__, `Lange & Huang (2022) `__ - cat: α, δ, z; shape cat, cosmo - hours (source-cat dominated) **Input notation** — α, δ = ``ra``, ``dec`` [deg]; z = ``redshift``; shape cat = :class:`~sum_stat.ShapeCatalogue` (HSC Y3 or DES Y3). .. _statistics-in-development: Statistics in development -------------------------- .. warning:: The following statistics are **not yet production-ready**. APIs and output formats may change without notice and results have not been validated against reference data. They are disabled by default in the pipeline scripts (``COMPUTE_STATS["multipoles"] = False``, ``COMPUTE_STATS["cl"] = False``). Do not use them for scientific analysis. See :doc:`development_estimators` for the detailed status and roadmap. kNN CPU times assume pyfnntw (Rust parallel kd-tree) on 16 cores; query points are drawn from the randoms catalogue. .. list-table:: :widths: 24 20 24 18 14 :header-rows: 1 * - Statistic - Function - Backend · Reference - Required inputs - CPU time (200k / 1M / 3M gal) * - LF φ(M) — 1/Vmax - :func:`~sum_stat.luminosity_function_vmax` - JAX; `Schmidt (1968) `__ - cat: z, M, w; area\ :sub:`sr`, cosmo - < 1 s / 1 s / 3 s * - LF φ(M) — SWML - :func:`~sum_stat.lf_smf.swml.luminosity_function_swml` - iterative; `Efstathiou et al. (1988) `__ - cat: z, M, w; mag\ :sub:`lim`, area\ :sub:`sr`, cosmo - < 1 s / 1 s / ~3 s * - LF Φ(`__ - cat: z, M, w; mag\ :sub:`lim`, area\ :sub:`sr`, cosmo - 40 s / ~15 min / ~2 h * - 3D ξ(r) - :func:`~sum_stat.xi_r` - treecorr LS; `Landy & Szalay (1993) `__, `Jarvis et al. (2004) `__ - cat: α, δ, z (gal + rand); cosmo - ~8 s / ~40 s / ~2 min * - Multipoles ξ\ :sub:`ℓ`\ (s) - :func:`~sum_stat.xi_multipoles` - treecorr + JAX Legendre; `Hamilton (1992) `__ - cat: α, δ, z (gal + rand); cosmo - ~8 s / ~40 s / ~2 min * - Angular C\ :sub:`ℓ` - :func:`~sum_stat.cl_angular` - healpy pseudo-C\ :sub:`ℓ`; `Hivon et al. (2002) `__, `Górski et al. (2005) `__ - cat: α, δ; nside, ℓ-bins - 4 s / 4 s / 6 s * - 3D P\ :sub:`ℓ`\ (k) - :func:`~sum_stat.pk3d`, :func:`~sum_stat.pk_multipoles` - JAX FFT; `Feldman et al. (1994) `__ - cat: α, δ, z; cosmo - < 1 s / ~1 s / ~3 s * - kNN-CDF F\ :sub:`k`\ (r) - :func:`~sum_stat.knn_cdf` - pyfnntw + JAX; `Banerjee & Abel (2021) `__ - cat: z, α, δ; rand; k, r\_bins, cosmo - < 10 s / < 1 min / < 5 min * - Cross-kNN F\ :sub:`k`\ :sup:`A,B`\ (r) - :func:`~sum_stat.cross_knn_cdf` - pyfnntw + JAX; `Banerjee & Abel (2021) `__ - cat A & B: z, α, δ; rand; k, r\_bins, cosmo - < 20 s / < 2 min / < 10 min * - kNN volume map V\ :sub:`k`\ (x) - :func:`~sum_stat.knn_volume_map` - pyfnntw; `Banerjee & Abel (2021) `__ - cat: z, α, δ; query grid xyz; k, cosmo - depends on grid size See :doc:`api/knn` for the kNN mathematical definition, HDF5 schema, and paper references. JAX design ---------- Selected estimators (1/Vmax sums, Legendre decomposition, power-spectrum post-processing) are implemented as JAX-native ``@jax.jit`` kernels, which enables: * **Automatic differentiation** — ``jax.grad`` and ``jax.hessian`` work through JAX kernels, enabling Fisher matrices and MCMC gradients in downstream modelling. * **Vectorisation** — ``jax.vmap`` is available for batching over parameter grids. The heavy pair-counting backends (Corrfunc C++, treecorr C++, dsigma Cython) and the iterative estimators (SWML, C⁻) are NumPy/SciPy based and are called outside JAX boundaries. Unit conventions ---------------- All outputs use **physical (h-free) Mpc** by default. This follows the astropy convention: :func:`~astropy.cosmology.FLRW.comoving_distance` and :func:`~astropy.cosmology.FLRW.comoving_volume` return Mpc and Mpc³ at face value with no implicit h factor. .. list-table:: :widths: 36 20 44 :header-rows: 1 * - Quantity - Unit - Notes * - Comoving distances - Mpc - physical, not h⁻¹ Mpc * - Comoving volumes (V\ :sub:`max`) - Mpc³ - physical, not h⁻³ Mpc³ * - Luminosity function φ(M) - Mpc⁻³ mag⁻¹ - physical, not h³ Mpc⁻³ mag⁻¹ * - Stellar mass function φ(M\ :sub:`★`) - Mpc⁻³ dex⁻¹ - physical, not h³ Mpc⁻³ dex⁻¹ * - ESD ΔΣ(r\ :sub:`p`) - M\ :sub:`☉` pc⁻² - h-independent * - Angular 2PCF w(θ) - dimensionless - h-independent * - 3D 2PCF ξ(r), multipoles ξ\ :sub:`ℓ`\ (s), w\ :sub:`p`\ (r\ :sub:`p`) - Mpc (separation), dimensionless (amplitude) - physical Mpc * - Wavenumber k (power spectrum only) - **h/Mpc** - sole exception: follows Fourier simulation convention * - Power spectrum P(k) - **(Mpc/h)³** - sole exception; matches nbody/simulation convention The value h = H₀/100 is stored in every HDF5 output file alongside H₀ under the ``cosmology/h`` dataset so that the power-spectrum axes can be read without arithmetic. Script organisation ------------------- Production measurements use standalone Python scripts in ``scripts/``. See :doc:`running` for commands. ``measure_joint_sumstat.py`` is the **unified entry point** for all joint measurements. .. list-table:: :widths: 46 10 44 :header-rows: 1 * - Script - Surveys - Purpose * - ``scripts/measure_joint_sumstat.py`` - bgs · mock · custom - **Unified joint measurement**: any combination of SMF, LF, WP, ESD_HSC, ESD_DES, ESD_KIDS, WTHETA, KNN; configurable bins; jackknife or bootstrap; crash-recovery checkpoints * - ``scripts/plot_joint_sumstat.py`` - bgs · mock · custom - Diagnostic plots for the HDF5 output of ``measure_joint_sumstat.py`` Output format ------------- The unified script ``measure_joint_sumstat.py`` writes all active statistics into a single HDF5 file per sample × weight variant. Bin edges are fully configurable via ``--bins-*`` (see :ref:`bin-spec`). Weight variants produce separate files; the variant suffix is appended to the filename (e.g. ``…-sys-comb.h5``; ``uniform`` has no suffix). HDF5 schema (``measure_joint_sumstat.py``):: {sample_id}_joint_{stats_slug}[-{weight}].h5 ├── smf/{sample_id}/ │ ├── cosmology/ H0, h, Om0, Ob0, Ok0 │ ├── bin_edges [dex] │ ├── log10mstar_centres [dex] │ ├── phi [Mpc⁻³ dex⁻¹] │ ├── phi_err [Mpc⁻³ dex⁻¹] │ └── cov [Mpc⁻⁶ dex⁻²] ├── lf/{sample_id}/ │ ├── M_centres [mag] │ ├── phi [Mpc⁻³ mag⁻¹] │ ├── phi_err [Mpc⁻³ mag⁻¹] │ └── cov ├── twopcf/wp_{sample_id}/ │ ├── sep_centres [Mpc] │ ├── xi [Mpc] │ └── cov ├── twopcf/wtheta_{sample_id}/ │ ├── sep_centres [arcmin] │ ├── xi [dimensionless] │ └── cov ├── esd/{sample_id}_{HSC|DES|KIDS}/ │ ├── rp_centres [Mpc] │ ├── delta_sigma [M_sun/pc²] │ └── cov ├── knn/{sample_id}/ │ ├── r_centres [Mpc] │ ├── k_values [dimensionless] │ ├── F_k [dimensionless] shape (n_k, n_r) │ └── F_k_poisson [dimensionless] Erlang reference ├── joint_covariance/ │ ├── data_vector concatenated statistic vector │ ├── err sqrt(diag(cov)) │ ├── cov full joint matrix │ ├── subsamples JK or bootstrap draws [N_iters × N_bins] │ └── attrs: statistics, slice_*, covariance_method └── _checkpoint/ └── subsamples in-progress iterations (NaN = not yet done) Every dataset carries ``unit`` and ``description`` attributes. The file-level ``git_hash`` attribute records the exact code revision. See :doc:`data_format` for the complete attribute table. .. _bin-spec: Bin specification ----------------- All ``--bins-*`` flags in ``measure_joint_sumstat.py`` accept one of three formats: .. list-table:: :widths: 35 65 :header-rows: 1 * - Format - Result * - ``log:N:MIN:MAX`` - ``np.logspace(log10(MIN), log10(MAX), N+1)`` — N log-spaced bins * - ``lin:N:MIN:MAX`` - ``np.linspace(MIN, MAX, N+1)`` — N linearly spaced bins * - ``arange:START:STOP:STEP`` - ``np.arange(START, STOP, STEP)`` — uniform spacing Examples:: --bins-rp log:15:0.1:40 # 15 log bins, 0.1–40 Mpc --bins-mstar arange:10.5:12.6:0.1 # 0.1 dex steps from 10.5 to 12.5 --bins-theta log:20:0.05:100 # 20 log bins, 0.05–100 arcmin --bins-knn-r log:10:1:30 # 10 log bins, 1–30 Mpc --knn-k 1 2 3 # k = 1, 2, 3 neighbours .. seealso:: A full reference list with annotated entries and ADS links is available at :doc:`bibliography`.