Spatial transcriptomics dreaming... and data storage needs

“If I can dream, in a few years we’ll have spatiotemporal single-cell ’omics in living tissues.” – Sten Linnarsson, Karolinska Institute[1]

When we know exactly what is going in every tissue and cell of the live human body, we probably will figure out how to cure aging.

But how much data storage is necessary to track every protein and non-coding RNA in the human body?

Here's a quick estimate:

  • 30 trillion cells per human
  • 80,000 unique non-coding RNA
  • 20,000 unique proteins
  • = 5x10^22 measurements per human
  • Now, measure this hourly every year (8,760 hrs/yr)
  • = 5x10^26 measurements per human per year
  • Call it 4 bytes per number (float32)
  • = 2x10^27 bytes per human per year
  • Convert this into zetabytes at 10^21 bytes per zetabyte
  • = 2x10^6 zetabytes per year per human
  • Measure this across 10 billion humans
  • = 2x10^16 zetabytes for humanity per year

So we need 2 million zetabytes (2x10^6) per human per year and 10 billion times more for every human. Unfortunately, we only captured about 100 zetabytes of data as a species in 2021[2]. This math is probably wrong... but probably at least directionally right.

We don't have the ability to store (or of course yet capture) this amount of data. But single-cell spatial transcriptomic methods will continue to rapidly improve, as will data storage and data science methods. For example, we may get tissue-level live 'omics in the next few decades.


[1] Marx, V. Method of the Year: spatially resolved transcriptomics. Nat Methods 18, 9–14 (2021). https://doi.org/10.1038/s41592-020-01033-y

[2] "Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025." Statistia.com. Accessed June 24, 2022. https://www.statista.com/statistics/871513/worldwide-data-created/