Mass spectrometry-based proteomics is a valuable method that can be used to interrogate healthy and diseased tissue samples at high depth. Recent technological advances now enable high-throughput proteomics, allowing the technique to operate at a level of resolution previously only possible in genomics. The Australian Cancer Research Foundation (ACRF) International Centre for the Proteome of Human Cancer (ProCanĀ®) houses six SCIEX 6600 TripleTOF instruments in a single facility capable of processing approximately 10,000 tumour samples per year, enabling large-scale cancer tissue proteomics. To efficiently process these samples, cohorts must be analysed in multiple batches, collected on multiple instruments and over extended periods of time. Analysing these kinds of proteomic datasets requires appropriate normalization and batch correction. However, it is still unclear how to best design such experiments given the inevitable practical constraints of a laboratory setting.
Ā
One desirable criterion for an experiment is that it should be minimally affected by any inter-batch variation introduced in the laboratory. This is particularly difficult to achieve in clinically relevant studies involving human tissue, where samples are limited and multiple influential variables are involved, such as gender, age and cancer subtype. We have developed a Monte Carlo algorithm for producing approximately blocked study designs based on investigator-nominated variables. The method is implemented as a C++ program with a Python wrapper and will be available as a package upon publication. Output of the program is an Excel spreadsheet, providing interpretable batching and run orders for a laboratory user environment. The effectiveness of the blocking process can be visualized by the user with several quality control worksheets. Our method reduces the risk of conducting experiments that are confounded between technical and biological variation and should be of significant value for the design of large-scale proteomic studies.