Mass spectrometry imaging (MSI) allows researchers to inspect the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra from spatial locations across its surface, producing hundreds or thousands of molecular ion images. While MSI promises exciting new insights in biomedical applications such as cancer research, the complexity of MSI datasets poses substantial challenges for visualization and statistical analysis. Furthermore, Improvements in instrumentation have led to rapid increases in mass and spatial resolution, producing larger datasets and larger file sizes, further compounding the analytic challenges.
These challenges must be met by an evolution in methods for visualization and statistical analysis of MSI experiments. However, despite the proliferation of machine learning algorithms and ad hoc data analysis and visualization tools for MSI, the development and adoption of appropriate statistical methods and reproducible experimental design has lagged behind. Many experiments with otherwise high-quality data still suffer from inadequate sample sizes or flawed experimental design, and few methods for statistical analysis exist.
We will break down the major statistical challenges of MSI analysis in three categories of analytic goals: (1) segmentation, (2) classification, and (3) class comparison. For segmentation, we will demonstrate spatial shrunken centroids, which performs simultaneous segmentation and selection of important ions. For classification, we present our current work on using multiple-instance learning in the presence of uncertain class labels. For class comparison, we present a novel single-ion segmentation method that can be used as input to a statistical testing procedure. These methods are implemented in the open-source R package Cardinal, which provides a full workflow of data import, pre-processing, visualization, and statistical analysis for MSI experiments.