.. _api-knn: k-Nearest-Neighbor Statistics — API ===================================== .. warning:: This module is **in active development** and has not yet been validated against reference simulations. Results and APIs may change without notice. Do not use for scientific analysis. See :doc:`../development_estimators` for the current status and what is needed before promotion to stable. The k-nearest-neighbor cumulative distribution function (kNN-CDF) is a flexible summary statistic that captures all connected N-point correlation functions within a single estimator. It consistently outperforms traditional two-point statistics when constraining cosmological parameters, especially on small and intermediate scales (`Banerjee & Abel 2021 `__). .. contents:: Contents :local: :depth: 1 Mathematical Definition ----------------------- For a galaxy catalogue with positions :math:`\{\mathbf{x}_i\}` and a set of :math:`N_q` random *query points* :math:`\{\mathbf{q}_j\}` drawn uniformly from the survey volume: .. math:: d_k(\mathbf{q}_j) \;=\; \text{distance from } \mathbf{q}_j \text{ to its } k\text{-th nearest galaxy} The **kNN-CDF** is the empirical distribution of these distances: .. math:: F_k(r) \;=\; \frac{1}{N_q} \#\!\left\{ j : d_k(\mathbf{q}_j) \le r \right\} For a homogeneous Poisson point process with mean number density :math:`\bar{n}`, the kNN-CDF is given by the regularised lower incomplete gamma function (Erlang distribution): .. math:: F_k^{\mathrm{Pois}}(r) \;=\; \frac{\gamma\!\left(k,\; \bar{n}\,\tfrac{4\pi}{3}\,r^3\right)}{\Gamma(k)} The deviation :math:`F_k(r) - F_k^{\mathrm{Pois}}(r)` encodes clustering beyond a Poisson baseline. Larger values of :math:`F_k` at fixed :math:`r` indicate *overdensities* (galaxies closer than expected); smaller values indicate *voids*. **kNN sphere volume** at a query point: .. math:: V_k(\mathbf{q}) \;=\; \tfrac{4\pi}{3}\,d_k(\mathbf{q})^3 This quantity is used for 2-D and 3-D density maps; see :func:`~sum_stat.knn_volume_map`. Backend: pyfnntw ----------------- Nearest-neighbor queries use `pyfnntw `__ (v0.4.1), a Python binding for the `fnntw `__ Rust crate — a parallel, cache-optimised kd-tree. The tree is built with ``leafsize=32`` and queries run across all available CPUs automatically. If ``pyfnntw`` is not installed, the implementation falls back to :class:`scipy.spatial.cKDTree`. Install the primary backend with: .. code-block:: bash pip install pyfnntw # requires the Rust toolchain (rustup.rs) Quick-Start Example ------------------- .. code-block:: python import numpy as np import sum_stat as ss from astropy.cosmology import FlatLambdaCDM cosmo = FlatLambdaCDM(H0=67.74, Om0=0.3089) # --- Load catalogues --- gal = ss.GalaxyCatalogue(ra=..., dec=..., redshift=...) rand = ss.GalaxyCatalogue(ra=..., dec=..., redshift=...) # survey randoms # --- Compute kNN-CDF for k = 1 … 5 --- k_values = np.arange(1, 6) r_bins = np.logspace(-1, 1.5, 21) # 0.1 – 31.6 Mpc r_centres, F_k, F_k_poisson = ss.knn_cdf( gal, cosmo, k_values, r_bins, rand=rand, # query points drawn from randoms n_query=100_000, ) # F_k.shape = (5, 20) # --- Cross-kNN between two populations --- r_c, F_k_a, F_k_b = ss.cross_knn_cdf( gal_a, gal_b, cosmo, k_values, r_bins, rand=rand, ) # --- Density map (V_k at a regular grid) --- from sum_stat.knn import comoving_xyz gal_xyz = comoving_xyz(gal, cosmo) # (N_gal, 3) [Mpc] grid_xyz = ... # your custom lattice vols = ss.knn_volume_map(gal, grid_xyz, [1, 2, 3, 4], cosmo) # --- Write to HDF5 --- with ss.SummaryStatWriter("results.h5") as w: w.write_knn( "knn/BGS_bright", r_centres, F_k, F_k_poisson, r_bins, k_values.astype(float), cosmo, {"survey": "DESI-BGS", "n_query": 100_000}, ) Output HDF5 Schema ------------------ :: knn/{sample_name}/ ├── attrs: estimator="knn-cdf", n_query, n_gal, survey, … ├── r_centres [unit: "Mpc"] ├── bin_edges [unit: "Mpc"] ├── k_values [unit: "dimensionless"] ├── F_k [shape: (n_k, n_r), unit: "dimensionless"] ├── F_k_poisson [shape: (n_k, n_r), unit: "dimensionless"] └── cosmology/ H0, h, Om0, Ob0, Ok0 API Reference ------------- .. autofunction:: sum_stat.knn_cdf .. autofunction:: sum_stat.cross_knn_cdf .. autofunction:: sum_stat.knn_volume_map .. autofunction:: sum_stat.knn_poisson_cdf .. autofunction:: sum_stat.comoving_xyz References ---------- * Banerjee A. & Abel T. (2021), *k-Nearest Neighbour Statistics of the Matter Distribution*, MNRAS 500, 5479. `ADS `__ | `arXiv `__ * Banerjee A., Abel T. & Neyrinck M. (2021), MNRAS 504, 2911. `ADS `__ * Banerjee A. & Abel T. (2023), MNRAS 519, 4856. `ADS `__ * Yuan S. et al. (2023), MNRAS 522, 3935. `ADS `__ * Gao Y. et al. (2025), MNRAS 543, 3409. `ADS `__ * Obreschkow D. et al. (2025), arXiv:2502.09709. `ADS `__