Co-fractionation mass spectrometry (CF-MS) is a method by which endogenous and unmanipulated protein complexes can be analysed on a broad scale in single experiments. CF-MS involves extensive biochemical fractionation of protein complexes using one or more non-denaturing chromatographic techniques (e.g. size exclusion chromatography (SEC)), followed by quantitative proteomics of each fraction. Subunits from the same intact complex will have highly correlated fractionation profiles.
Despite its demonstrated utility (1-2), best practice approaches for CF-MS remain undefined. Here we gain insight into how to best collect and interpret CF-MS data by benchmarking CF-MS datasets against gold standard complexes in Saccharomyces cerevisiae, one of the few organisms for which high proteome-coverage reference libraries of gold standard complexes exist.
By benchmarking experimental and modelled CF-MS datasets, we find that co-analysis of data from complementary biochemical fractionation methods (e.g. using Fisher’s combined probability test) identifies complexes with greater efficiency than stand-alone biochemical fractionation. Systematic identification of gold standard complexes using 17 correlation metrics indicates that some metrics (e.g. Spearman correlation) are more effective than others (e.g. Mutual Information).
Many fractionation profiles that were unable to be benchmarked were nonetheless highly correlated, and thus possibly derived from novel complexes. Principal component analysis of gold standard and putative novel complexes indicated that novel complexes frequently elute in later SEC fractions, and are therefore often small. To test the effects of using orthogonal data (e.g. Gene Ontology) to assist in the prediction of these novel complexes, the Extending ‘Guilt-by-Association’ by Degree R package (3) was used. These analyses found that identifications of gold standard complexes are likely to benefit from the integration of GO data, whereas predictions of novel complexes are not. This suggests that orthogonal experimental validation (e.g. cross-linking mass spectrometry) may be required to validate novel complexes in CF-MS datasets.