Oral Presentation 25th Annual Lorne Proteomics Symposium 2020

MHCpLogics: a machine learning-based tool for unsupervised data visualisation and cluster analysis of immunopeptidomes (#32)

Mohammad Shahbazy 1 , Pouya Faridi 1 , Sri H Ramarathinam 1 , Nathan P Croft 1 , Anthony W Purcell 1
  1. Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Clayton, Victoria, Australia

Background The major histocompatibility complex encodes human leukocyte antigens (HLAs) in humans, which bind and present intracellular peptides that are then displayed on the cell surface for recognition by T cells. The repertoires of peptides presented by HLA are termed immunopeptidomes. The highly-polymorphic nature of HLA confers allele-specific differences in the sequence properties of bound ligands, designated as peptide-binding motifs. Herein, we developed MHCpLogics as a machine learning-based tool for clustering analysis, amino acid-based feature selection, and sequence motif visualisation of peptides to discover landscapes in human immunopeptidomes.

Methodology We used new experimental and previously published immunopeptidomics data from mono- and multi-allelic cell lines to cover a wide range of HLA alleles. Each peptide sequence was numerically encoded to allow subsequent machine learning analysis via code programming in MATLAB/Python.

Results The MHCpLogics tool provides dimensionality data reduction, exploratory analysis, visualisation, and clustering analysis of peptide data alongside exporting sequence motif logos. In the known HLA datasets, the tool showed clear deconvolution of motif clusters, highlighting the restricted nature of motifs from mono-allelic immunopeptidomes and yet readily segregable clusters from multi-allelic data. Across all data, contaminant sequences could be easily identified, allowing exclusion from further analysis. Visualisation modalities grant users the features/abilities to inspect clusters down to individual peptides and examine broader/higher-level patterns, as well as density visualisation and heatmap analysis.  Additional statistical outputs provide information, e.g., the proportion of HLA-binders, hierarchical cluster analysis dendrograms, and amino acid frequencies.

Conclusion MHCpLogics can deconvolute large mass spectrometry-based immunopeptidome data, allowing interrogation of clusters/sub-clusters of peptide motifs, with the representation of data in a wide array of visualisation options and the ability to export peptide sequence lists. The tool will be an essential asset to the immunology community, allowing easy and rapid inspection of immunopeptidomes and, ultimately, the identification of HLA alleles present in unknown samples.