Reproducing Manuscript Code

This page describes a practical workflow for using the Cell-GPS manuscript code. The goal is to make the computational source for each figure and table easy to inspect, rerun where data access permits, and adapt to related spatial omics datasets.

Quick Start

Install the package and clone the repository:

pip install Cell-GPS
git clone https://github.com/hutaobo/cellgps.git
cd cellgps

For editable local development:

pip install -e .

The manuscript notebooks are under:

Cell-GPS manuscript code/
  main_figures/
  supplementary_figures/
  supplementary_tables/

Open the notebook corresponding to the figure or table of interest, inspect the first markdown cell for provenance, then adapt the data paths if needed.

Data Availability and Paths

Large raw spatial omics datasets are not bundled with the repository. The notebooks therefore preserve the original paths used during analysis, such as Y:\long\... on Windows workstations and /data/taobo.hu/... or /mnt/taobo.hu/... on A100/Linux systems.

When rerunning a notebook, use one of these strategies:

  • Run on the same workstation or server where the original paths are mounted.

  • Replace the original path variables with local copies of the same datasets.

  • Use the notebook as a code template and substitute your own Xenium, transcript-coordinate, or cell-coordinate data.

The final multi-panel manuscript figures were usually assembled from generated PDF/PNG panels. The notebooks focus on the computational source for those panels rather than reproducing Illustrator or PowerPoint assembly exactly.

Core Package Entry Points

Most manuscript notebooks use one or more of these public entry points:

  • compute_cophenetic_distances_from_df for coordinate-table inputs.

  • compute_cophenetic_distances_from_adata for AnnData/Xenium workflows.

  • plot_cophenetic_heatmap for StructureMap heatmaps.

  • transcript_by_cell_analysis for transcript-to-cell spatial topology.

  • compute_cophenetic_distances_from_df_memory_opt for large tables.

  • plot_circular_dendrogram_pycirclize for circular hierarchy plots.

Reproducibility Status

The curated notebooks are designed to expose the analysis logic and code provenance. They are not all guaranteed to execute on a fresh public machine because some require large datasets, local manuscript outputs, or server-side intermediate files. This is the expected status by category:

  • Synthetic benchmark notebooks are the easiest to rerun when dependencies are installed, because they generate benchmark patterns programmatically.

  • Mouse pup, lymph node, pulmonary fibrosis, SSc, and TNBC notebooks require access to the corresponding public or local processed spatial omics data.

  • DST-GNN notebooks require the recovered DST-GNN release workspace and its flattened SSS input tables.

  • Supplementary table notebooks require the intermediate benchmark or transcript-by-cell result tables used for the manuscript.

Validation

For a local documentation check:

python -m sphinx -b html docs docs/_build/html

For notebook validation without rerunning heavy computations, use nbformat to confirm that the notebook files are structurally valid. GitHub rendering also expects notebooks to include stable cell IDs, which these curated notebooks now include.