=========================== Reproducing Manuscript Code =========================== This page describes a practical workflow for using the Cell-GPS manuscript code. The goal is to make the computational source for each figure and table easy to inspect, rerun where data access permits, and adapt to related spatial omics datasets. Quick Start =========== Install the package and clone the repository: .. code-block:: console pip install Cell-GPS git clone https://github.com/hutaobo/cellgps.git cd cellgps For editable local development: .. code-block:: console pip install -e . The manuscript notebooks are under: .. code-block:: text Cell-GPS manuscript code/ main_figures/ supplementary_figures/ supplementary_tables/ Open the notebook corresponding to the figure or table of interest, inspect the first markdown cell for provenance, then adapt the data paths if needed. Data Availability and Paths =========================== Large raw spatial omics datasets are not bundled with the repository. The notebooks therefore preserve the original paths used during analysis, such as ``Y:\long\...`` on Windows workstations and ``/data/taobo.hu/...`` or ``/mnt/taobo.hu/...`` on A100/Linux systems. When rerunning a notebook, use one of these strategies: * Run on the same workstation or server where the original paths are mounted. * Replace the original path variables with local copies of the same datasets. * Use the notebook as a code template and substitute your own Xenium, transcript-coordinate, or cell-coordinate data. The final multi-panel manuscript figures were usually assembled from generated PDF/PNG panels. The notebooks focus on the computational source for those panels rather than reproducing Illustrator or PowerPoint assembly exactly. Recommended Reading Order ========================= For readers who want to understand the method before rerunning full manuscript analyses: 1. Read :doc:`manuscript_overview`. 2. Run the simple coordinate-table example in :doc:`usage`. 3. Inspect ``Figure_1_synthetic_benchmark.ipynb`` for the benchmark logic. 4. Inspect the figure-specific notebook listed in :doc:`manuscript_code_index`. 5. Use ``docs/cellgps_science_manuscript_code_inventory.md`` for deeper provenance notes when a notebook refers to legacy local or server paths. Core Package Entry Points ========================= Most manuscript notebooks use one or more of these public entry points: * ``compute_cophenetic_distances_from_df`` for coordinate-table inputs. * ``compute_cophenetic_distances_from_adata`` for AnnData/Xenium workflows. * ``plot_cophenetic_heatmap`` for StructureMap heatmaps. * ``transcript_by_cell_analysis`` for transcript-to-cell spatial topology. * ``compute_cophenetic_distances_from_df_memory_opt`` for large tables. * ``plot_circular_dendrogram_pycirclize`` for circular hierarchy plots. Reproducibility Status ====================== The curated notebooks are designed to expose the analysis logic and code provenance. They are not all guaranteed to execute on a fresh public machine because some require large datasets, local manuscript outputs, or server-side intermediate files. This is the expected status by category: * Synthetic benchmark notebooks are the easiest to rerun when dependencies are installed, because they generate benchmark patterns programmatically. * Mouse pup, lymph node, pulmonary fibrosis, SSc, and TNBC notebooks require access to the corresponding public or local processed spatial omics data. * DST-GNN notebooks require the recovered DST-GNN release workspace and its flattened SSS input tables. * Supplementary table notebooks require the intermediate benchmark or transcript-by-cell result tables used for the manuscript. Validation ========== For a local documentation check: .. code-block:: console python -m sphinx -b html docs docs/_build/html For notebook validation without rerunning heavy computations, use ``nbformat`` to confirm that the notebook files are structurally valid. GitHub rendering also expects notebooks to include stable cell IDs, which these curated notebooks now include.