Computational Notebooks for "Morphology-Aware Profiling of Highly Multiplexed Tissue Images using Variational Autoencoders"

Gregory J. Baker^1,2,3,&,,#, Edward Novikov^1,4,, Shannon Coy^1,2,5, Yu-An Chen^1,2, Clemens B. Hug¹, Zergham Ahmed^1,4, Sebastián A. Cajas Ordóñez⁴, Siyu Huang^4,%, Clarence Yapp¹, Artem Sokolov¹, Hanspeter Pfister⁴, Peter K. Sorger^1,2,3,#

¹Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA ²Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA ³Department of Systems Biology, Harvard Medical School, Boston, MA ⁴Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ⁵Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA

^& Current affiliation: Division of Oncological Sciences, Knight Cancer Institute, Oregon Health & Science University, Portland, OR
^% Current affiliation: Visual Computing Division, School of Computing, Clemson University, Clemson, SC

*Co-first Authors: G.J.B., E.N.
#Corresponding Authors: gbak7696@gmail.com (G.J.B.), peter_sorger@hms.harvard.edu (P.K.S.)

Abstract

Spatial proteomics (highly multiplexed tissue imaging) provides unprecedented insight into the types, states, and spatial organization of cells within preserved tissue environments. To enable single-cell analysis, high-plex images are typically segmented using algorithms that assign marker signals to individual cells. However, conventional segmentation is often imprecise and susceptible to signal spillover between adjacent cells, interfering with accurate cell type identification. Segmentation-based methods also fail to capture the morphological detail that histopathologists rely on for disease diagnosis and staging. Here, we present a method that combines unsupervised, pixel-level machine learning using autoencoders with traditional segmentation to generate single-cell data that captures information on protein abundance, morphology, and local neighborhood in a manner analogous to human experts while overcoming the problem of signal spillover. The result is a more accurate and nuanced characterization of cell types and states than segmentation-based analysis alone.

Running the computational notebooks

Python code in this GitHub repository is organized into Jupyter notebooks used to generate the figures shown in the paper. To run the code, first clone this repository onto your computer by opening a terminal window and entering the following command:

git clone https://github.com/labsyspharm/vae-paper.git

Next, change directories into the top level directory of the cloned repository and create and activate a dedicated Conda environment containing the necessary Python libraries for running the code:

cd <path/to/cloned/repo>
conda env create -f environment.yml
conda activate morphaeus-paper

If conda is not already installed, you can download it by following the instructions provided here.

To browse the Jupyter notebooks, change directories to the src folder and activate Jupyter Lab with the following command:

jupyter lab

Downloading input data files

To re-run the Jupyter notebooks, input data must first be downloaded from our public Amazon S3 bucket. This can be done by running the download.py script located in the src folder. In addition to the required input data, this script will also download a folder containing precomputed output files as a reference (output_reference):

# from the top level directory
python src/download.py

Note: ~313GB of storage space is required to download the complete file set.

To re-run any of the notebooks in Jupyter Lab, first double click on a .ipynb file at the left of the screen and the notebook will open at the right. Then, click the double-arrow button at the top of the notebook to restart the kernel and run all cells. Notebook output will be saved to a folder called output at the top level of the repository.

MORPHӔUS source code and demo

MORPHÆUS source code is freely available for academic re-use under the MIT license on GitHub and is archived on Zenodo.

To demo the data analysis pipeline, be sure that the input data files have first been downloaded as described above, then change directories to the demo directory and run the following command:

vae config.yml

This will execute the pipeline on a small subsample of data from the CyCIF-1A image presented in the paper, demonstrating all major modules ranging from single-cell CSV subsampling and image patch generation, to VAE model training, plot visualization, and concept saliency analysis.

Note: demo results will differ from those shown in the paper due to the use of a smaller training dataset and fewer training epochs. Each epoch is estimated to complete in about 30sec - 1min running locally on CPUs. For this example, ~100 epochs are required before learned reconstructions begin to resemble cells and the data start to form discrete clusters in feature space. As a convenience, lightly pre-trained encoder and decoder networks are provided so that the pipeline skips the VAE model training step. For those who desire to train a new model, prior to executing the pipeline, please comment out the encoder.hdf5 and decoder.hdf5 files as well as the TRAIN_VAE.txt checkpoint file.

Zenodo archive

This GitHub repository will be archived on Zenodo following publication of the manuscript.

Funding

This work was supported by Ludwig Cancer Research and the Ludwig Center at Harvard (P.K.S., S.S.), the Gray Foundation, and by NIH NCI grants U01-CA284207, and U2C-CA233262. S.S. is supported by the BWH President’s Scholars Award. Results shown in this study are in part based upon data generated by the Human Tumor Atlas Network (HTAN, https://humantumoratlas.org/).

References

Baker GJ., Novikov E. et al. Morphology-Aware Profiling of Highly Multiplexed Tissue Images using Variational Autoencoders. bioRxiv (2025) https://doi.org/10.1101/2025.06.23.661064

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
demo		demo
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Computational Notebooks for "Morphology-Aware Profiling of Highly Multiplexed Tissue Images using Variational Autoencoders"

Gregory J. Baker^1,2,3,&,,#, Edward Novikov^1,4,, Shannon Coy^1,2,5, Yu-An Chen^1,2, Clemens B. Hug¹, Zergham Ahmed^1,4, Sebastián A. Cajas Ordóñez⁴, Siyu Huang^4,%, Clarence Yapp¹, Artem Sokolov¹, Hanspeter Pfister⁴, Peter K. Sorger^1,2,3,#

Abstract

Running the computational notebooks

Downloading input data files

MORPHӔUS source code and demo

Zenodo archive

Funding

References

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

labsyspharm/vae-paper

Folders and files

Latest commit

History

Repository files navigation

Computational Notebooks for "Morphology-Aware Profiling of Highly Multiplexed Tissue Images using Variational Autoencoders"

Gregory J. Baker1,2,3,&,*,#, Edward Novikov1,4,*, Shannon Coy1,2,5, Yu-An Chen1,2, Clemens B. Hug1, Zergham Ahmed1,4, Sebastián A. Cajas Ordóñez4, Siyu Huang4,%, Clarence Yapp1, Artem Sokolov1, Hanspeter Pfister4, Peter K. Sorger1,2,3,#

Abstract

Running the computational notebooks

Downloading input data files

MORPHӔUS source code and demo

Zenodo archive

Funding

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Gregory J. Baker^1,2,3,&,,#, Edward Novikov^1,4,, Shannon Coy^1,2,5, Yu-An Chen^1,2, Clemens B. Hug¹, Zergham Ahmed^1,4, Sebastián A. Cajas Ordóñez⁴, Siyu Huang^4,%, Clarence Yapp¹, Artem Sokolov¹, Hanspeter Pfister⁴, Peter K. Sorger^1,2,3,#

Packages