Spatial transcriptomics (ST) provides a molecularly rich description of tissue organization, enabling the unsupervised discovery of 'tissue niches' – spatially coherent regions characterized by distinct cell-type compositions and functions. These niches are highly relevant to both fundamental biological research and clinical interpretation. However, spatial transcriptomics technology remains costly and its data scarce, whereas H&E histology is abundant but offers a less granular molecular signal.
To address this disparity, a novel approach is proposed that leverages paired spatial transcriptomics and H&E data to transfer transcriptomics-derived niche structure to a histology-only model. This is achieved through a cross-modal knowledge distillation framework, which effectively bridges the information gap between the two modalities during the training phase.
Across multiple tissue types and disease contexts, the distilled model demonstrated substantially higher agreement with transcriptomics-derived niche structures when compared to unsupervised morphology-based baselines trained on identical image features. Furthermore, the model successfully recovered biologically meaningful neighborhood compositions, a finding corroborated by detailed cell-type analysis. A significant advantage of this framework is its ability to be applied to unseen tissue regions using only histology images during inference, completely eliminating the need for any transcriptomic input once the model has been trained. This dramatically enhances the accessibility and efficiency of advanced tissue organization analysis.