API reference¶

This page documents the public Cell-GPS functions intended for direct use in analysis scripts. The import path is cellgps.

Core COSTE and StructureMap functions¶

cellgps.compute_cophenetic_distances_from_df(df: DataFrame, x_col: str = 'x', y_col: str = 'y', z_col: str | None = None, celltype_col: str = 'celltype', output_dir: str | None = None, method: str = 'average', show_corr: bool = False) → Tuple[DataFrame, DataFrame][source]¶

Compute and return cophenetic distance matrices in both row and column dimensions, then apply linear normalization to [0, 1] for each separately.

If z_col is provided, uses (x, y, z) for distance computation; otherwise uses only (x, y).

Parameters:¶

dfpd.DataFrame: DataFrame containing cell data.
x_col, y_col, z_colstr, optional: Column names for spatial coordinates. z_col defaults to None.
celltype_colstr, optional: Column name for cell type.
output_dirOptional[str]: Output file directory; if None, uses the current working directory.
methodstr, optional: Linkage method for hierarchical clustering. Defaults to “average”.
show_corrbool, optional: Whether to print the cophenetic correlation coefficient for rows and columns. Defaults to False.

Returns:¶

Tuple[pd.DataFrame, pd.DataFrame]: Row and column cophenetic distance matrices, both normalized to [0, 1].

cellgps.compute_cophenetic_distances_from_adata(adata: anndata.AnnData, cluster_col: str = 'Cluster', output_dir: str | None = None, method: str = 'average') → Tuple[DataFrame, DataFrame][source]¶

Compute and return cophenetic distance matrices in both row and column dimensions (using cophenet), then apply linear normalization to [0,1] for each separately.

Unlike the previous version, min and max values are computed independently for rows and columns.

cellgps.compute_searcher_findee_distance_matrix_from_df(df: DataFrame, x_col: str = 'x', y_col: str = 'y', z_col: str | None = None, celltype_col: str = 'celltype') → DataFrame[source]¶

Compute and return a directed inter-cluster average nearest-neighbor distance matrix. Row and column indices are the clusters (cell types) present in df; rows represent “Searcher” clusters, columns represent “Findee” clusters. Each element is the average nearest-neighbor distance from all cells in the row cluster to all cells in the column cluster. Clusters with no cells in the data will not appear in the result matrix.

Parameters:¶

dfpd.DataFrame: DataFrame containing cell coordinates and type data.
x_col, y_colstr, optional: Column names for cell x/y coordinates. Defaults to “x” and “y”.
z_colOptional[str], optional: Column name for the z coordinate; if provided it is used, otherwise None means 2D only.
celltype_colstr, optional: Column name for cell type / cluster labels. Defaults to “celltype”.

Returns:¶

pd.DataFrame: Distance matrix DataFrame with cluster names as index and columns. Shape is (n_clusters, n_clusters); values are the average nearest-neighbor distance between the corresponding cluster pairs. NaN if unavailable.

cellgps.compute_cophenetic_from_distance_matrix(distance_matrix: DataFrame, method: str = 'average', show_corr: bool = False) → Tuple[DataFrame, DataFrame][source]¶

Perform hierarchical clustering in both row and column directions on the given inter-cluster distance matrix, and compute cophenetic distance matrices. Results are independently normalized to [0,1] for rows and columns.

Parameters:¶

distance_matrixpd.DataFrame: Input distance matrix with source clusters as rows and target clusters as columns (e.g. output of compute_searcher_findee_distance_matrix_from_df).
methodstr, optional: Linkage method for hierarchical clustering. Defaults to “average”.
show_corrbool, optional: Whether to print the cophenetic correlation coefficient (printed separately for rows and columns). Defaults to False.

Returns:¶

Tuple[pd.DataFrame, pd.DataFrame]: (row_coph, col_coph). Cophenetic distance matrices (DataFrames) for row and column clusters, each independently normalized to [0,1].

cellgps.compute_cophenetic_distances_from_df_memory_opt(df: DataFrame, x_col: str = 'x', y_col: str = 'y', z_col: str | None = None, celltype_col: str = 'celltype', method: str = 'average', show_corr: bool = False, batch_size: int | None = None) → Tuple[DataFrame, DataFrame][source]¶: Same functionality as the original compute_cophenetic_distances_from_df, but reduces memory usage via batched computation.

cellgps.pick_batch_size(n_cells: int, dims: int = 2, frac: float = 0.3, hard_min: int = 50000, hard_max: int | None = None, bytes_per_row: int | None = None, safety_gb: float = 8.0, env_override_var: str = 'BATCH_SIZE_OVERRIDE') → int[source]¶

Pick a batch size that better utilizes RAM on big machines.

Key ideas: - Allow an env override (for quick experiments). - Subtract a fixed safety buffer (safety_gb) from available RAM. - Make bytes_per_row configurable; provide a conservative default. - Optional hard_max; if None, we don’t clamp by a hard cap.

Parameters¶

n_cellsint: Total number of items to process.
dimsint: Dimensionality; may influence copies inside algorithms.
fracfloat: Fraction of available RAM to budget.
hard_minint: Lower bound for stability on small RAM.
hard_maxOptional[int]: Upper bound; set None to disable hard clamping.
bytes_per_rowOptional[int]: Estimated peak bytes per row for the step. If None, pick a conservative default.
safety_gbfloat: Keep this amount of RAM free regardless (OS/page cache/etc.).
env_override_varstr: If set, this env var forces the batch size (int), bypassing heuristics.

Returns¶

int: A batch size in [hard_min, n_cells] (and <= hard_max if provided).

Topology extensions¶

cellgps.compute_weighted_cophenetic_distances_from_df(df: DataFrame, x_col: str = 'x', y_col: str = 'y', z_col: str | None = None, group_col: str = 'celltype', weight_col: str | None = 'weight', min_weight: float = 0.0, method: str = 'average', show_corr: bool = False) → tuple[DataFrame, DataFrame][source]¶

cellgps.compute_weighted_searcher_findee_distance_matrix_from_df(df: DataFrame, x_col: str = 'x', y_col: str = 'y', z_col: str | None = None, group_col: str = 'celltype', weight_col: str | None = 'weight', min_weight: float = 0.0) → DataFrame[source]¶

Compute a weighted directed searcher→findee average nearest-neighbor matrix.

The weighting scheme is intentionally conservative to preserve backward compatibility with the original t_and_c logic: the nearest-neighbor geometry is unchanged, while the row-wise aggregation becomes a weighted average over source/searcher points. When every point has unit weight, the result is exactly equivalent to compute_searcher_findee_distance_matrix_from_df.

cellgps.build_entity_points_from_expression(reference_df: DataFrame, expression_df: DataFrame, *, entities: Iterable[str] | None = None, cell_id_col: str = 'cell_id', x_col: str = 'x', y_col: str = 'y', min_weight: float = 0.0, entity_col: str = 'entity', weight_col: str = 'weight') → DataFrame[source]¶

cellgps.compute_entity_structuremap(entity_points_df: DataFrame, *, x_col: str = 'x', y_col: str = 'y', z_col: str | None = None, entity_col: str = 'entity', weight_col: str = 'weight', min_weight: float = 0.0, method: str = 'average') → DataFrame[source]¶

cellgps.compute_entity_to_cell_topology(reference_df: DataFrame, entity_points_df: DataFrame, *, x_col: str = 'x', y_col: str = 'y', z_col: str | None = None, celltype_col: str = 'celltype', entity_col: str = 'entity', weight_col: str = 'weight', min_weight: float = 0.0, method: str = 'average') → DataFrame[source]¶

Generalize transcript-by-cell topology to arbitrary weighted entities.

reference_df contains the fixed cell-type template. entity_points_df contains an entity label plus spatial points and weights. For every entity we temporarily append its weighted point cloud to the reference template, compute a weighted StructureMap, and extract the entity→celltype row.

cellgps.compute_pathway_activity_matrix(expression_df: DataFrame, pathway_definitions: Mapping[str, Any] | DataFrame, *, method: str = 'rank_mean', normalize: bool = True) → DataFrame[source]¶

cellgps.ligand_receptor_topology_analysis(*, reference_df: DataFrame | None = None, expression_df: DataFrame | None = None, lr_pairs: DataFrame, output_dir: str | PathLike[str] | None = None, adata: Any = None, entity_points_df: DataFrame | None = None, tbc_results: str | PathLike[str] | None = None, t_and_c_df: DataFrame | None = None, cluster_col: str = 'Cluster', cell_id_col: str = 'cell_id', x_col: str = 'x', y_col: str = 'y', celltype_col: str = 'celltype', ligand_col: str = 'ligand', receptor_col: str = 'receptor', prior_col: str = 'evidence_weight', structure_map: DataFrame | None = None, structure_map_df: DataFrame | None = None, anchor_mode: str = 'precomputed', expression_support_mode: str = 'pseudobulk_detection', contact_mode: str = 'strength_coverage', entity_min_weight: float = 0.0, detection_threshold: float = 0.0, k_neighbors: int = 8, radius: float | None = None, topology_method: str = 'average', top_n_pairs: int = 12, hotspot_quantile: float = 0.9, min_cross_edges: int = 50, contact_expr_threshold: str | float = 'q75_nonzero', use_raw: bool = False) → dict[str, Any][source]¶

cellgps.ligand_receptor_target_consistency(lr_scores: DataFrame, receiver_signatures: Mapping[str, Any] | DataFrame, ligand_target_prior: DataFrame, *, ligand_col: str = 'ligand', receiver_col: str = 'receiver_celltype', target_col: str = 'target', prior_weight_col: str = 'weight', signature_gene_col: str = 'gene', signature_weight_col: str = 'score') → DataFrame[source]¶

Compute a NicheNet-like downstream target consistency layer.

The default scoring is intentionally lightweight: for each ligand and receiver cell type we compute the weighted overlap between the ligand prior targets and the receiver signature genes. The output can be merged back onto the ligand_receptor_topology_analysis result table.

cellgps.pathway_topology_analysis(*, pathway_definitions: Mapping[str, Any] | DataFrame, reference_df: DataFrame | None = None, expression_df: DataFrame | None = None, output_dir: str | PathLike[str] | None = None, adata: Any = None, tbc_results: str | PathLike[str] | None = None, t_and_c_df: DataFrame | None = None, cluster_col: str = 'Cluster', cell_id_col: str = 'cell_id', x_col: str = 'x', y_col: str = 'y', celltype_col: str = 'celltype', scoring_method: str = 'weighted_sum', view: str = 'intrinsic', structure_map: DataFrame | None = None, structure_map_df: DataFrame | None = None, anchor_mode: str = 'precomputed', pathway_modes: Sequence[str] = ('gene_topology_aggregate', 'activity_point_cloud'), primary_pathway_mode: str = 'gene_topology_aggregate', pathway_aggregate: str = 'weighted_median', activity_threshold_schedule: Sequence[float] = (0.95, 0.9, 0.8, 0.7, 0.6, 0.5), min_activity_cells: int = 50, entity_min_weight: float = 0.0, k_neighbors: int = 8, radius: float | None = None, topology_method: str = 'average', hotspot_quantile: float = 0.9, use_raw: bool = False) → dict[str, Any][source]¶

Preprocessing and input helpers¶

cellgps.load_xenium_data(folder: str, normalize: bool = True)[source]¶: Load and preprocess a Xenium run through pyXenium.io.read_xenium.

cellgps.load_xenium_table_bundle(folder: str | PathLike[str], *, cells_path: str | PathLike[str] | None = None, cell_groups_path: str | PathLike[str] | None = None, feature_matrix_path: str | PathLike[str] | None = None, normalize: bool = False, cluster_col: str = 'Clusters', cell_id_col: str = 'Barcode', x_col: str = 'x_centroid', y_col: str = 'y_centroid')[source]¶

Load a Xenium table bundle through pyXenium.io.read_xenium.

The returned object keeps the requested cluster labels in adata.obs[cluster_col] and mirrors them into adata.obs["Cluster"] for the Cell-GPS analysis API.

cellgps.merge_xenium_clusters_into_adata(sdata, xenium_dir: str, table_key: str = 'table', clustering_root: str = 'analysis/clustering', barcode_col: str = 'Barcode', cluster_col: str = 'Cluster') → Tuple['anndata.AnnData', List[str], Dict[str, float]][source]¶: Auto-collect xenium_dir/analysis/clustering/**/clusters.csv and merge clustering columns into sdata.tables[table_key].obs. Prefers linking via obs[‘cell_id’]; falls back to shapes index mapping if unavailable. Returns (adata, list of new column names, per-column non-NA hit rate report).

cellgps.read_visium_bin(base: Path, dataset_id: str, use_filtered: bool = True, keep_tmp: bool = False)[source]¶: Adapter for spatialdata-io 0.3.0, reads Visium HD output containing Parquet coordinates. Does not write any files to base.

Plotting¶

cellgps.plot_cophenetic_heatmap(matrix: DataFrame, matrix_name: str | None = None, output_dir: str | None = None, output_filename: str | None = None, figsize: Tuple[float, float] | None = None, cmap: str = 'RdBu', linewidths: float = 0.5, annot: bool = False, sample: str = 'Sample', xlabel: str | None = None, ylabel: str | None = None, show_dendrogram: bool = True, quiet: bool = True, return_figure: bool = False, return_image: bool = False, dpi: int = 300)[source]¶

Draw a cophenetic heatmap (seaborn.clustermap), guaranteeing:

Text in PDF is editable
Legend position is auto-adjusted
figsize is dynamically adjusted
fontTools.subset & findfont logs are silenced

Parameters:

…existing parameters… return_figure: whether to return the figure object instead of saving to file return_image: whether to return a high-resolution PIL image instead of the figure object dpi: image DPI resolution, only effective when return_image=True

Returns:

If return_figure=True, returns a seaborn.ClusterGrid object If return_image=True, returns a PIL.Image object Otherwise returns None

cellgps.generate_cluster_distance_heatmap_from_adata(adata: anndata.AnnData, cluster_col: str = 'Cluster', output_dir: str | None = None, output_filename: str | None = None, figsize: tuple = (8, 8), cmap: str = 'RdBu', max_scale: float = 10, show_dendrogram: bool = True)[source]¶

Generate and save a distance heatmap from each cell cluster to its nearest cluster center.

Parameters:¶

adataanndata.AnnData: AnnData object containing preprocessed data.
cluster_colstr, optional: Column name in adata.obs containing cluster information. Defaults to “Cluster”.
output_dirOptional[str]: Output directory for the PDF file. Defaults to current working directory.
output_filenameOptional[str]: Output file name. If not specified, uses “clustermap_output_{sample}.pdf”.
figsizetuple, optional: Size of the heatmap. Defaults to (7, 7).
cmapstr, optional: Colormap for the heatmap. Defaults to “RdBu”.
max_scalefloat, optional: max_value parameter for sc.pp.scale, used to clip Z-scores. Defaults to 10.

Returns:¶

None

cellgps.generate_cluster_distance_heatmap_from_df(df: DataFrame, x_col: str = 'x', y_col: str = 'y', celltype_col: str = 'celltype', sample: str = 'Sample', output_dir: str | None = None, output_filename: str | None = None, figsize: tuple = (8, 8), cmap: str = 'RdBu', show_dendrogram: bool = True)[source]¶

Generate and save a distance heatmap from each cell cluster to its nearest cluster center.

Parameters:¶

dfpd.DataFrame: DataFrame containing cell data.
x_colstr, optional: Column name for x coordinates. Defaults to ‘x’.
y_colstr, optional: Column name for y coordinates. Defaults to ‘y’.
celltype_colstr, optional: Column name for cell type. Defaults to ‘celltype’.
output_dirOptional[str]: Output directory for the PDF file. Defaults to current working directory.
output_filenameOptional[str]: Output file name. If not specified, uses “clustermap_output.pdf”.
figsizetuple, optional: Size of the heatmap. Defaults to (8, 8).
cmapstr, optional: Colormap for the heatmap. Defaults to “RdBu”.

Returns:¶

None

cellgps.generate_cluster_distance_heatmap_from_path(base_path: str, sample: str, figsize: tuple = (8, 8), output_dir: str | None = None, show_dendrogram: bool = True)[source]¶

Generate and save a distance heatmap from each cell cluster to its nearest cluster center.

Parameters:¶

base_pathstr: Base path where data is stored.
samplestr: Sample name used to specify the data folder.
output_dirOptional[str]: Output directory for the PDF file. Defaults to current working directory.

Returns:¶

None

cellgps.circle_heatmap(bg_df: DataFrame, circle_df: DataFrame, *, cmap: str = 'RdBu', size_exponent: float = 1.0, circle_fill: str = 'white', circle_edge: str = 'black', circle_edge_lw: float = 0.5, add_legend: bool = True, legend_title: str = 'Transcript Percentage (%)', figsize: tuple = (8, 6), ax: Axes = None)[source]¶

Draw a combined heatmap and circles plot:

bg_df: scores between 0–1, represented with red-white-blue;
circle_df: percentages 0–100 (%), encoded as circle area;
0% draws no circle, 100% maps exactly to a circle of cell diameter;
The legend only shows five percentages: [5, 25, 45, 65, 85].

API reference¶

Core COSTE and StructureMap functions¶

Parameters:¶

Returns:¶

Parameters:¶

Returns:¶

Parameters:¶

Returns:¶

Parameters¶

Returns¶

Topology extensions¶

Preprocessing and input helpers¶

Plotting¶

Parameters:¶

Returns:¶

Parameters:¶

Returns:¶

Parameters:¶

Returns:¶

Cell-GPS

Navigation

Related Topics