Mass spectrometry (MS)-based quantitative proteomics depends on accurate and precise feature detection. Reliable extraction of feature characteristics has a substantial impact on the number and quality of peptides and proteins identified and quantified.
Feature detection is inherently difficult because of the high complexity of MS proteomics data from biological samples, where the data is voluminous, complex and imbued with chemical and electronic noise. This combination of complexity and noise presents a daunting hurdle for improving algorithms when new technology arises, as any improvements in instrument sensitivity and speed also concurrently increase the data size, complexity and types of noise detected.
While Ion-Mobility (MS) is not a new technology, the introduction of the Bruker timsTOF Pro has made the technology commercially available. The instrument’s dual TIMS stages provides improved sensitivity with respect to conventional drift tubes. The increased size and complexity of the four-dimensional data produced by the instrument, though, is a significant added challenge for existing feature detection algorithms.
In this work, we present a new data processing pipeline for feature detection, peptide and protein identification, and quantitation from timsTOF data. The algorithm focuses on performing feature detection on a patch of data where the instrument has performed a fragmentation event. The pipeline is highly parallelisable to reduce analysis times. We also introduce a step in the pipeline where we perform targeted feature detection based on the peptide features identified in the experiment, and use machine learning to propagate identifications across LC-MS runs and to reduce missing values.
To test this pipeline we used a set of samples from a HeLa:e.Coli mixture where the proteomes were mixed in 1:1 or 1:3 ratios and analysed using a 15-minute gradient. The analysis identifies 28,186 peptides (max q-value is 0.01). For comparison, MaxQuant identified 27,901 peptides. Our method also shows that we achieve fewer missing values.