Oral Presentation 25th Annual Lorne Proteomics Symposium 2020

Strategies to improve reproducibility of large-scale and longitudinal proteomics (#2)

Rebecca C Poulos 1 , Peter G Hains 1 , Rohan Shah 1 , Natasha Lucas 1 , Dylan Xavier 1 , Srikanth Manda 1 , Asim Anees 1 , Jennifer MS Koh 1 , Sadia Mahboob 1 , Max Wittman 1 , Steven G Williams 1 , Erin K Sykes 1 , Michael Hecker 1 , Michael Dausmann 1 , Merridee A Wouters 1 , Keith Ashman 2 , Jean Yang 3 , Peter Wild 4 5 , Anna deFazio 6 7 8 , Rosemary Balleine 1 , Brett Tully 1 , Ruedi Aebersold 9 10 , Terence P Speed 11 12 , Yansheng Liu 13 14 , Roger R Reddel 1 , Phillip J Robinson 1 , Qing Zhong 1
  1. Children's Medical Research Institute, University of Sydney, Westmead, NSW, Australia
  2. Sciex, 2 Gilda Court, Mulgrave, VIC, Australia
  3. School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
  4. Dr. Senckenberg Institute of Pathology, University Hospital Frankfurt, Frankfurt am Main, Germany
  5. Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
  6. Centre for Cancer Research, Westmead Institute for Medical Research, Westmead, NSW, Australia
  7. Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
  8. Department of Gynaecological Oncology, Westmead Hospital, Westmead, NSW, Australia
  9. Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
  10. Faculty of Science, University of Zürich, Zürich, Switzerland
  11. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
  12. Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
  13. Department of Pharmacology, Yale University School of Medicine, New Haven, CT, USA
  14. Yale Cancer Biology Institute, Yale University, West Haven, CT, USA

Reproducibility of research results is the bedrock of experimental science. Reproducibility is particularly difficult to achieve in large-scale studies on inherently variable clinical samples. SWATH mass spectrometry (MS) has emerged as a robust proteomic method with the capacity to acquire data at high sample throughput. The reproducibility of SWATH-MS proteomic measurements of human tissue samples, acquired across multiple instruments over time in a single laboratory, has not been established. We aimed to assess the reproducibility of industrial- and clinical-scale SWATH-MS and develop computational methods for improving quantitative accuracy.  To this end, we performed 1,560 SWATH-MS runs of eight samples comprising a dilution series of prostate cancer tissue in fixed proportion (50%), with a variable fraction of ovarian cancer tissue (3-50%) offset by yeast cells, and a control cell line. Replicates were run on six mass spectrometers operating continuously with varying maintenance schedules over four months, interspersed with more than 5,000 runs from unrelated studies. We first applied a normalisation strategy that utilises negative controls and replicates to remove unwanted variation and elevate biological signal with greater success than existing approaches frequently applied in proteomics. We next developed a strategy for replacing missing values in normalised data by leveraging measurements acquired from replicates spanning multiple instruments. We integrated these new computational modules into a pipeline called ProNorM (Proteomics Normalisation and Missingness Removal). With ProNorM, we could mitigate technical variation between instruments across extended periods and rescue approximately 20% of values missing for non-biological reasons. ProNorM enabled the detection of peptide intensity changes at low concentrations in a dilution series comprising complex human cancer tissues, and allowed the prediction of tissue content in mixed samples with high accuracy by machine learning. Taken together, we demonstrate large-scale SWATH-MS data to be comparable over extended periods, providing a pathway toward reproducible clinical proteomics.